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This  dissertation  investigates  the  applicability  of  artificial  neural  network  systems 
to  preliminary  engineering  design  tasks.  Synthesizing  new,  possibly  innovative  designs  by 
exploring  the  development  of  structural  topologies  and  determining  their  possible 
behaviors  are  two  steps  of  preliminary  design  where  this  research  concentrates.  These  two 
areas  of  preliminary  structural  design  have  proven  difficult  for  design  researchers.  Using 
the  neural  network  approach  toward  these  tasks  is  feasible,  but  issues  such  as  representing 
design  problems  in  neural  networks,  collecting  good  design  examples,  and  measuring 
network  performance  are  still  unresolved. 

This  research  begins  by  examining  philosophies  of  design,  which  provides  a  basis 
for  later  discussions.  In  particular,  the  influence  of  design  automation  and  computational 
models  of  design  processes  on  the  science  of  design  are  considered. 

Next,  this  work  provides  an  introduction  to  artificial  neural  networks.  Two  classes 
of  neural  models,  constraint  satisfaction  and  supervised  learning  models,  are  examined  in 
depth.  The  constraint  satisfaction  model  is  later  used  for  development  of  a  system  for 
qualitative  evaluation  of  preliminary  designs.  Supervised  learning  models  provide  the 


cornerstone  for  development  of  a  model  that  uses  induction  in  an  attempt  to  learn  from 
design  examples,  generalize  results,  and  generate  preliminary  structural  designs. 

A  major  bottleneck  in  developing  most  knowledge  based  systems  is  acquiring  and 
representing  requisite  knowledge.  Supervised  learning  models  of  connectionism  have  the 
potential  to  alleviate  this  obstacle.  The  second  neural  network  system  discussed  and 
demonstrated  is  a  hybrid  back  propagation  model.  This  system  can  learn  from  examples  of 
previous  designs  and  is  able  to  generate  new  designs. 

In  addition  to  design  issues,  the  discussion  of  connectionist  models  includes  details 
of  the  different  models,  their  performance,  attributes,  integrity,  and  shortcomings.  The 
results  of  this  research  are  an  initial  investigation  into  connectionism  as  applied  to  design. 
Both  connectionism  and  the  theory  of  design  are  relatively  young  in  terms  of  formal 
research  when  compared  to  traditional  areas  of  engineering  and  science.  This  work 
contributes  to  the  maturing  effort  and  identifies  promising  areas  for  further  research. 
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INTRODUCTION 

The  objectives  of  this  chapter  are  to  provide  a  general  framework  for  preliminary 
structural  design  processes  considered  in  this  research  and  identify  the  role  of  neural 
network  systems  within  this  framework.  This  chapter  also  includes  an  overview  of  the 
approach  taken  here  and  the  motivations  for  exploring  neural  network  techniques.  The  last 
section  of  this  chapter  gives  the  organization  of  the  remainder  of  this  thesis. 

Computational  Models  of  Design  Processes 
Computational  models  of  design  concentrate  on  two  broad  areas.  First, 
computational  models  focus  on  how  computers  can  design  or  assist  in  designing  artifacts. 
In  this  area,  computational  design  models  can  describe,  replicate,  or  simulate  the  cognitive 
process  that  human  designers  employ,  or  they  can  describe  how  a  computer  can 
accomplish  some  design  task.  These  models  can  be  derived  from  observation  of  human 
designers,  but  not  necessarily.  Secondly,  they  can  serve  as  a  controlled  environment  for 
research  into  design  theory.  By  providing  a  design  system  that  can  reproduce  results  in  a 
consistent,  logical  fashion,  computational  models  allow  design  researchers  to  examine 
different  processes  and  theories  into  the  nature  of  design  itself. 

Until  recently,  computational  models  of  design  have  concentrated  on  designing 
solely  for  function  and  fit.  An  artifact's  preliminary  design  has  ignored  the  implications  of 
manufacture,  maintenance,  process  planning,  and  inspection  as  well  as  other  life-cycle 


issues  such  as  disposability. 
Designers  have  traditionally 
considered  these  issues  only 
after  important  design 
decisions  and  commitments 
have  been  made,  resulting 
in  designs  that  could  not 
meet  life-cycle  requirements 
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Figure  1 :  Mapping  Between  Abstraction  Levels 
from  conception  to  disposal.  The  economic  cost  resulting  from  this  practice  has  led  to  a 
growing  interest  in  what  designers  call  design  for  manufacture,  concurrent  design, 
simultaneous  engineering,  and  design  for  the  life  cycle.  Computational  models  of 
preliminary  design  should  address  these  issues. 

Models  of  design  normally  consider  processes  that  map  an  explicit  set  of  design 
requirements  into  a  description  of  a  physically  realizable  object  that  satisfies  those  given 
requirements.  These  models  are  incremental  and  iterative  in  nature  and  consist  of  several 
stages  or  steps.  Bell  et  al.  [Bell91]  describes  a  general  process  where  an  abstract  design 
goes  through  a  series  of  mappings,  iterative  redesign  steps,  and  optimizations  as  shown  in 
Figure  1.  Given  a  set  of  initial  design  requirements,  the  design  process  improves  the 
artifact  through  some  iterative  design  and/or  optimization  procedure  until  it  can  no  longer 
make  further  progress.  At  this  point,  the  computational  process  maps  the  artifact  to  a  less 
abstract  stage  using  all  information  available.  Then  the  design  process  is  repeated.  This 
cycle  continues  until  a  detailed  design  of  the  artifact  results.  If  any  stage  of  this  process 
fails  or  is  unable  to  continue,  then  a  costly  redesign  process  commences.  A  redesign 


process  would  typically  increase  the  abstraction  level,  moving  backwards,  therefore, 
increasing  the  design  time. 

What  is  not  clear  about  this  general  mapping  model  is  how  the  mapping  between 
abstraction  levels  is  identified  and  performed.  Human  designers  easily  apply  these 
mappings  at  convenient  times;  however,  it  has  been  difficult  for  existing  computational 
models  to  mimic  this  activity.  This  research  attempts  to  employ  artificial  neural  networks 
to  this  task,  one  in  which  knowledge  based  systems  have  had  difficulty. 

Iterative  Design  Processes 

Nevill  et  al.  [Nevill89a]  and  Flemming  et  al.  [Flemming92]  describe  the  iterative 

design  process  that  occurs  between  abstraction  mappings.  Both  of  these  models  are 

similar;  however,  it  is  illustrative  to  note  their  differences.  Nevill  et  al.  characterize  a 

design  model  with  the  following  phases. 

Evaluation  of  the  status  of  an  artifacts  design  with  respect  to  the  design 
requirements, 

Generation  of  candidate  design  steps, 

Prediction  of  implications  of  the  candidate  design  steps, 

Selection  of  a  candidate  design  step, 

Implementation  of  the  candidate  design  step, 

Notification  of  the  implications  of  that  step. 

Flemming  et  al.  describe  a  similar  incremental,  iterative  design  process  model.  This 

model  describes  the  resulting  artifact  by  its  form,  function,  and  behavior,  and  it  involves 


1  It  should  be  noted  that  in  Bell's  design  process  model,  iterative  redesign  and 
optimization  are  not  exclusive  in  that  either  or  both  may  be  employed  at  an  abstraction 
level. 


four  stages,  synthesis,  analysis,  evaluation,  and  redesign.  These  stages  are  described  as 
follows: 

•  Synthesis  is  the  process  of  developing  one  or  more  candidate  forms  given  a  set  of 
design  requirements. 

•  Analysis  is  the  process  of  determining  each  candidate's  behavior. 

•  Evaluation  is  the  process  of  comparing  the  behavior  and  candidate  form  to  the 
requirements. 

•  Redesign  is  the  process  of  further  refinement  and  selection  of  one  or  more 
candidate  artifacts  using  information  gained  from  the  evaluation  of  current  and 
earlier  candidate  designs. 

Both  computational  models  are  iterative,  but  the  primary  difference  between  these 
two  computational  design  models  is  that  Flemming  et  al.  explicitly  identify  the  importance 
of  design  requirements  [Flemming92];  whereas,  Nevill  et  al.  imply  their  significance 
[Nevill89a].  In  addition,  the  Nevill  et  al.  model  considers  design  as  a  constraint 
satisfaction  process  (i.  e.,  the  explicit  Notification  step);  whereas,  the  Flemming  et  al. 
model  is  less  specific  and  does  not  prescribe  a  design  methodology.  These  differences 
illustrate  that  there  is  no  single  design  process  model  that  has  been  accepted  by  the  design 
community. 

As  design  theory  research  has  progressed  over  the  past  decade,  most 
computational  models  have  adopted  like  approaches.  Figure  2  shows  a  schematic  diagram 
of  an  incremental,  iterative  computational  design  model  with  each  of  the  design  stages 
described  by  Flemming  et  al.  [Flemming92]  and  Nevill  et  al.  [Nevill89a].  The  following 
sections  describe  each  design  stage  as  it  pertains  to  this  research. 
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The  Synthesis  Process 

The  synthesis  process  maps  functional 
requirements  to  a  description  of  an  artifacts  form, 
which  includes  its  geometry,  topology,  shape,  and 
even  materials.  Finger  and  Dixon  [Finger89a]  call  this 
both  conceptual  design  and  configuration  design. 
According  to  Finger  and  Dixon,  these  are  different 
steps  in  the  sense  that  in  conceptual  design  an  Fi8ure  2:  Iterative  Design  Process 

artifact's  function  is  explicit  and  is  used  to  generate  new  designs;  however,  in 
configuration  design,  an  artifact's  function  is  usually  implicit  and  is  used  to  evaluate 
designs.  Both  conceptual  and  configuration  design  stages  are  necessary  to  generate  an 
artifact's  form.  Research  models  for  synthesis  processes  are  only  just  beginning  to  appear. 
This  work  investigates  a  neural  network  approach  to  this  stage  of  an  iterative  design 
process  for  preliminary  structural  design. 

Preliminary  structural  design  offers  a  difficult  problem  area  where  there  are  few 
constraints,  requirements,  and  objectives  that  can  be  expressed  in  algebraic  form.  Formal 
numerical  optimization  is  not  readily  adapted  to  this  task.  The  relationships  between  form 
and  performance  are  not  clearly  defined.  There  is  not  enough  heuristic  knowledge  for 
preliminary  structural  design  nor  even  a  domain  theory,  which  contains  the  knowledge  that 
a  system  can  use  in  the  problem  solving  process.  Good  conceptual  structural  designers 
significantly  rely  on  their  past  design  experience.  Thus,  mathematical  optimization  and 
knowledge-based  approaches  to  structural  synthesis  design  models  have  been  limited.  The 
most  significant  impediment  has  been  acquisition  and  representation  of  enough  basic  and 


experiential  knowledge  for  structural  synthesis.  Therefore,  one  of  the  goals  of  this 
research  is  to  investigate  and  identify  promising  approaches  to  acquiring  and  using 
structural  synthesis  knowledge. 

The  Analysis  Process 

The  purpose  of  the  analysis  process  is  to  determine  the  behavioral  characteristics 
of  the  forms  developed  from  the  synthesis  process  with  respect  to  the  functional 
requirements.  At  the  initial  stages  of  design,  attributes  of  an  artifact  often  are  not  yet  fully 
described;  however,  analysis  of  preliminary  designs  is  important  before  mapping  to 
another  less  abstract  level.  Without  information  concerning  behavior  of  partially 
instantiated  artifacts,  these  designs  can  only  be  analyzed  subjectively  and  implicitly. 
This  research  attempts  to  identify  ways  for  better  analysis  of  incomplete  designs  in  order 
to  explore  more  configuration  and  conceptual  alternatives  during  synthesis.  Since  a 
general  configuration  without  fully  instantiated  attributes  can  result  from  the  synthesis 
stage,  this  research  demonstrates  a  method  of  qualitative  analysis  of  preliminary  structural 
designs  using  the  design's  functional  requirements  and  first  principles  of  engineering. 

Evaluation  and  Redesign  Processes 

This  research  does  not  attempt  to  explicitly  investigate  the  evaluation  and  redesign 
stages  of  iterative  design  processes. 

A  Role  for  Neural  Network  Systems 
For  design  problems  where  design  requirements  can  be  expressed  in  algebraic  form 
as  constraints  and  an  objective  function,  designers  can  employ  numerical  optimization 
techniques  to  search  the  solution  space  for  an  optimal  solution.  Mathematical  optimization 


attempts  to  minimize  (or  maximize)  an  objective  function  without  violating  any 
constraints.  In  formal  mathematical  optimization,  constraints  are  hard  in  the  sense  that  a 
solution  cannot  violate  any  constraint.  The  preliminary  design  stage  and  in  particular  the 
synthesis  process  are  extremely  difficult  if  not  impossible  to  directly  cast  into  a 
mathematical  optimization  problem  without  first  assigning  values  to  design  attributes  to 
formulate  an  objective  function  and  any  associated  constraints.  The  lack  of  both  constraint 
and  objective  functions  for  form-function  relationships  makes  numerical  optimization  a 
deficient  approach  for  structural  synthesis  design  processes  at  this  time. 

Knowledge-based  preliminary  design  systems  offer  a  different  technique.  Here 
automated  design  systems  use  heuristic  knowledge  about  specific  design  domains  to 
search  through  a  space  of  possible  design  solutions  for  the  one  which  best  satisfies  a  set  of 
design  heuristics.  The  levels  of  knowledge  required  for  these  heuristics  are  widely  varying, 
from  first  principles  to  domain  specific  knowledge.  Design  heuristics  are  difficult  to 
develop  since  the  amount  and  variations  in  the  types  of  knowledge  that  design  heuristics 
require  make  collections  of  heuristics  sufficient  for  many  design  domains,  specifically 
structural  synthesis,  almost  impossible  to  qualify.  In  addition  structural  synthesis  lacks  a 
significant  domain  theory  or  collection  of  knowledge  that  designers  recognize  and  follow, 
which  inhibits  developing  heuristics.  The  lack  of  a  domain  theory  can  be  accredited  to  the 
chaotic  and  creative  nature  of  preliminary  design  processes  that  rely  heavily  on 
experiential  knowledge  that  is  difficult  to  qualify  or  quantify. 

Good  designers  in  most  fields  follow  their  personal,  past  design  experiences. 
Experiential  knowledge  combined  with  first  principles  leads  to  "good"  designs.  Experience 
lets  good  designers  be  flexible  in  that  they  adjust  to  unforeseen  circumstances,  take 
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advantage  of  opportunity,  divide  complicated  situations  into  manageable  parts,  and  create 
abstractions  that  simplify  design  processes.  Because  both  numerical  optimization  and 
knowledge-based  design  models  lack  experiential  knowledge  and  a  "complete"  basis  of 
first  principles,  they  have  difficulty  performing  as  well  as  good  human  designers  in  design 
synthesis  processes. 

Human  designers  readily  map  between  abstraction  layers;  oftentimes,  without 
identifiable  rules.  Although  the  abstraction  layers  they  use  may  be  of  common  types, 
different  designers  move  between  these  layers  in  a  chaotic  manner.  More  likely  than  not, 
these  designers  are  relying  on  their  experience  and  intuition.  If  their  foray  into  a  less 
abstract  description  does  not  provide  the  desired  effect,  then  humans  easily  adapt  by 
becoming  more  abstract.  This  type  of  behavior  is  difficult  to  model  using  knowledge-based 
systems. 

Artificial  neural  networks  can  overcome  many  of  these  obstacles.  Neural  networks 
can  learn  from  experiences  and  work  with  large  numbers  of  constraints  or  requirements. 
Learning  allows  neural  systems  to  identify  relationships  and  self-organize  these 
relationships,  producing  a  mapping  between  a  set  of  inputs  and  some  set  of  outputs.  The 
parallel  nature  and  layered  architecture  of  artificial  neural  network  systems  offer  a 
potential  for  working  with  large  amounts  of  interdependent  information  in  a  relatively 
efficient  manner.  Thus,  artificial  neural  networks  have  the  potential  to  organize  and  use 
the  immense  amount  of  requisite  information  characteristic  of  design  problems.  Ivezic  and 
Garrett  [Ivezic92]  state  that  machine  learning  of  synthesis  knowledge  facilitates  a  more 
direct  mapping  of  design  requirements  and  specifications  to  a  realizable  artifact  than 
knowledge-based  or  mathematical  optimization  approaches.  Therefore,  the  role  of 


artificial  neural  networks  in  design  synthesis  domains  is  to  acquire  from  past  design 
experiences  relationships  between  specified  design  requirements  and  physically  realizable 
objects  that  satisfy  those  requirements.  A  goal  of  this  research  is  to  investigate  and  identify 
promising  neural  network  approaches  to  preliminary  structural  design  synthesis  that  can 
learn  from  previous  design  experiences  and  efficiently  utilize  large  numbers  of  constraints. 

Overview  of  Approach 

Mapping  between  abstraction  layers,  developing  structural  topologies,  and 

determining  possible  behavior  of  candidate  preliminary  designs  are  three  areas  of 

preliminary  structural  design  synthesis  that  have  proved  difficult  for  design  researchers. 

The  approach  taken  in  this  research  concentrates  on  investigating  neural  network 

approaches  for  these  tasks.  The  following  assumptions  guide  this  research: 

•     Design  is  by  nature  a  multidisciplinary  effort  involving  teams  of  designers  with 
different  areas  of  expertise.  This  research  will  not  attempt  to  invent  a  complete 
computational  design  model  for  general  structural  design  compromising  complete, 
large  real  world  projects.  Instead,  this  study  will  concentrate  on  small,  compliant 
structural  designs  that  encompass  as  many  different  aspects  of  the  preliminary 
structural  design  domain  as  is  computationally  feasible. 


• 


Although  design  tasks  are  integrated,  the  state  of  current  research  into  design  and 
neural  networks  predicates  disconnected  undertakings  in  these  areas,  primarily  for 
tractability. 

This  study  considers  three  areas  of  preliminary  structural  design: 

•  qualitative  analysis  of  preliminary  structural  systems, 

•  synthesis  of  preliminary  structural  designs, 

•  and  mapping  between  abstraction  levels. 

Design  requirements  are  identifiable  and  can  be  expressed  in  some  manner  as  goals 
and  constraints. 

Taking  a  neural  network  approach  does  not  obligate  a  new  artificial  neural 
network  paradigm.  This  work  seeks  powerful  artificial  neural  networks  but 
concentrates  on  their  use. 
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Based  on  the  above  assumptions,  the  goal  of  investigating  and  identifying  promising  neural 
network  approaches  to  preliminary  structural  design  synthesis  leads  to  the  following  sub- 
goals: 

•  Identify  a  neural  network  approach  to  managing  constraints  in  preliminary 
structural  design  for  analyzing  and  synthesizing  preliminary  designs. 

•  Identify  a  suitable  neural  network  learning  approach  for  acquiring  structural  design 
synthesis  knowledge. 

•  Identify  possible  representations  of  abstract  concepts  and  objects  suitable  for  the 
neural  systems  and  to  simplify  the  design  task  at  hand. 

Thus,  this  dissertation  discusses  and  illustrates  the  use  of  artificial  neural  networks  to 
manage  design  constraints  and  to  acquire  and  represent  synthesis  knowledge  for 
preliminary  structural  design  to  achieve  the  goal  of  identifying  useful  neural  network 
approaches  to  preliminary  design  tasks. 

Preliminary  design  models  have  been  limited  by  their  ability  to  acquire  and  reuse 
experiential  knowledge.  Preliminary  design  lacks  a  strong  domain  theory  that  makes 
development  of  computational  design  models  very  difficult.  An  inductive  learning 
approach  could  acquire  and  then  reuse  knowledge  embodied  in  past  design  experiences, 
which  are  portrayed  as  successful  and  valid  design  cases.  This  knowledge  could  then  be 
brought  to  bear  on  synthesis,  abstraction  mapping,  and  constraint  management  tasks. 

Analysis  of  preliminary  designs  to  determine  the  behavioral  characteristics  of  the 
forms  developed  from  the  synthesis  process  has  also  been  restricted  by  a  lack  of  reusable 
knowledge.  In  these  cases,  knowledge  from  basic  principles  to  domain  specific  knowledge 
is  required;  however,  the  lack  of  specific  values  for  variables  and  the  wide  range  of 
requisite  knowledge  make  this  task  difficult.  By  characterizing  essential  knowledge  as  soft 
constraints  and  minimizing  the  instantiation  of  design  variables,  a  neural  network  approach 
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may  be  used  to  manage  those  constraints  in  a  constructive,  intuitive  manner  compatible 
with  preliminary  design  processes,  thereby  providing  an  analysis  capability  limited  or 
nonexistent  until  this  time. 

This  research  focuses  on  neural  network  systems  that  manage  constraints  and  learn 
synthesis  and  abstraction  mapping  knowledge  for  preliminary  structural  design.  The 
following  are  motivations  for  investigating  artificial  neural  networks: 

•  Artificial  neural  networks  can  learn  complex,  nonlinear  relationships  from  a  sample 
of  input-output  pairs  that  represent  those  relationships.  The  relationships  need  not 
be  explicit.  By  presenting  a  network  with  a  training  sample  of  previous  designs,  it 
may  learn  those  design  relationships  and  develop  an  appropriate  taxonomy. 

•  During  the  learning  stage,  these  systems  store  entities  to  be  represented  as  a 
pattern  of  activity  distributed  over  many  computing  elements.  Since  the  knowledge 
is  stored  in  the  strengths  of  the  interconnections  between  processing  units,  the 
knowledge  about  any  individual  input-output  pattern  pair  is  not  stored  in  the 
connections  of  a  special  unit  reserved  for  that  pattern,  but  it  is  distributed  over  the 
connections  among  many  processing  units.  Distributed  representations  provide  a 
way  to  implement  best-fit  searches  of  a  solution  space,  and  they  have  the  ability  to 
learn  new  concepts  without  having  to  increase  size  of  memory. 

•  Because  knowledge  is  distributed  over  many  processing  units  in  a  trained  neural 
network,  the  system's  response  can  be  insensitive  to  slight  variations  in  input, 
gracefully  degrade  in  these  situations,  allow  for  automatic  generalization,  and 
produce  novel  outputs  [Hinton86]. 

•  Some  neural  network  training  paradigms  demonstrate  inductive  learning  processes 
where  general,  basic  principles  are  derived  from  samples.  Training  sets  must 
include  the  principles  that  the  system  will  learn  in  either  explicit  or  implicit  form. 

•  Because  artificial  neural  networks  can  learn,  the  knowledge  acquisition  bottleneck 
associated  with  knowledge-based  computational  design  models  is  alleviated. 

•  Neural  networks  perform  well  on  tasks  similar  to  design,  where  there  are  large 
numbers  of  constraints,  partial  information,  and  parallel  tasks,  such  as 
combinatorial  optimization,  pattern-recognition,  speech  understanding,  and  vision 
processing  [Fukushima87]. 
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Organization 
This  section  outlines  the  remainder  of  this  thesis.  The  next  chapter  presents  an 
overview  of  design  theory  and  methodology  research  with  an  emphasis  on  computational 
models  of  design  processes.  Here  a  distinction  between  traditional  knowledge-based 
approaches  and  artificial  neural  networks  is  made  and  the  motivation  for  exploring 
artificial  neural  networks  in  the  context  of  design  is  solidified.  The  third  chapter  reviews 
artificial  neural  networks  in  a  general  sense.  The  fourth  chapter  describes  a  particular  type 
of  artificial  neural  network  called  harmony  theory  and  its  use  as  a  prototype  computer- 
based  model  for  qualitative  analysis  of  preliminary  designs.  The  fifth  chapter  goes  into 
detail  about  the  backpropagation  neural  network  paradigm.  It  describes  the  general  theory 
of  backpropagation  and  includes  enhancements  and  pseudo  second-order  methods  for 
learning.  The  sixth  chapter  details  the  development  and  implementation  of  a  feedforward 
neural  network  simulator  used  in  this  research.  The  seventh  chapter  provides  several 
design  examples  exploring  the  use  of  feedforward  neural  networks  for  preliminary  design. 
The  final  chapter  summarizes  the  results  of  this  research  and  contains  conclusions  and 
recommendations  for  further  work  in  this  field. 


DESIGN  THEORY  AND  METHODOLOGY 

This  chapter  begins  by  providing  a  broad  description  of  engineering  design, 
concentrating  on  the  general  theory  and  methodology.  Common  design  models  are  then 
reviewed  which  leads  to  an  examination  of  computational  design  process  models.  By 
developing  a  broad  description  of  design  and  reviewing  current  work  in  engineering  design 
theory  and  in  the  development  of  computational  models  of  design  processes,  this  chapter 
provides  a  perspective  and  the  motivation  for  succeeding  connectionist  computational 
design  models  described  in  later  chapters. 

What  Is  Design 

The  proper  study  of  mankind  is  the  science  of  design.  [Simon69,  page  83] 
In  The  Sciences  of  the  Artificial.  Simon  introduces  the  possibility  of  creating  a  science  or 
sciences  of  design.  In  this  series  of  essays,  he  shows  that  it  is  possible  to  explain  an 
artificial  science  (as  opposed  to  a  natural  science)  and  illustrate  that  artificial  science's 
nature.  His  two  illustrative  examples  of  artificial  sciences  in  this  book  are  the  fields  of 
cognitive  psychology  and  engineering  design.  Simon's  work  was  one  of  the  first  such 
essays  to  challenge  design  researchers  to  explore  and  define  their  science,  and  it  has  not 
been,  by  any  means,  conclusive.  Design  researchers  [Dixon88,  Finger89a,  Finger89b, 
French91,  Hajela88,  Mostow85,  Tong87]  have  expressed  a  need  for  research 
methodologies  and  formalisms  in  design  research.  If  we  examine  what  Simon  means  by  a 
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science  of  the  artificial,  we  can  gain  some  insight  into  the  question:  what  is  design?  After 
defining  design,  we  can  then  critically  examine  the  theory  and  some  existing  computational 
models  of  design. 

If  a  natural  science  is  knowledge  about  natural  phenomena  and  objects  that  occur 
in  nature  without  human  intervention,  then  an  artificial  science  is  knowledge  about 
artificial  phenomena  and  objects.  Simon  identified  four  distinguishable  indicators  for 
artificial  phenomena  and  objects.  They  are  as  follows: 

1.  Artificial  phenomena  and  objects  are  synthesized  by  humans. 

2.  Artificial  phenomena  and  objects  can  imitate  natural  things. 

3.  Artificial  phenomena  and  objects  can  be  characterized  by  how  they  function,  how 
they  attain  goals,  and  how  they  have  been  adapted. 

4.  Artificial  phenomena  and  objects  are  discussed  and  described  in  not  only 
descriptive  terms  but  also  in  imperative  terms  that  detail  desired  functioning  or 
goal  achievement. 

Using  Simon's  four  indicators  of  artificial  phenomena  and  objects  and  relating  them  to 
design  processes  and  artifacts,  we  can  characterize  design  as  a  process  that  synthesizes  an 
artifact  that  functions  to  attain  some  specified  goal  or  goals.  We  must  take  note,  however, 
that  the  terms  artificial  with  respect  to  artificial  phenomena  and  objects  and  artificial 
science  actively  describe  real,  not  imaginary,  artifacts  and  knowledge.  It  is  important  to 
emphasize  that  these  artifacts  function  to  achieve  specific  goals. 

Computationally,  attainment  of  design  goals  should  not  be  considered  an 
optimization  problem.  Christopher  Alexander  in  his  book,  Notes  on  the  Synthesis  of  Form 
states: 


A  design  problem  is  not  an  optimization  problem.  In  other  words,  it  is  not  a  problem 
of  meeting  any  one  requirement  or  any  function  of  a  number  of  requirements  in  the 
"best  possible"  way.  .  .  .  For  most  requirements  it  is  important  to  satisfy  them  at  a 
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level  which  suffices  to  prevent  misfit  between  the  form  and  the  context,1  and  to  do 
this  in  the  least  arbitrary  manner  possible.  [Alexander64,  page  99] 

Thus,  design  attempts  to  satisfy  requirements  rather  than  optimize  for  those  requirements. 
This  is  particularly  evident  in  preliminary  design  because  in  preliminary  design  we  rarely 
have  a  method  for  finding  an  optimum  since  these  types  of  design  problems  have  limited 
available  quantitative  information.  When  comparing  preliminary  design  solutions,  we 
usually  use  qualitative  terms  such  as  "better"  and  "worse"  rather  than  quantitative  terms. 
This  is  not  to  say  that  optimization  methods  are  unimportant  design  tools.  They  are 
actually  under  utilized  particularly  at  later  stages  of  design  where  numerical  optimization 
techniques  can  be  readily  applied,  but  in  preliminary  design,  they  are  unsuitable  for  direct 
application  because  of  the  qualitative  nature  of  design  requirements. 

Design  Requirements 
Design  is  the  synthesis  of  an  artifact  that  satisfies  requirements.  These 
requirements  or  design  goals  help  define  the  function  and  purpose  of  an  artifact.  By 
making  incremental  steps  to  satisfy  each  requirement,  design  requirements  may  help  guide 
a  design  process.  Explicit  statement  of  each  design  requirement  helps  describe  what  a 
design  must  achieve;  however,  general  requirements  cannot  be  uniformly  described  for  all 
the  varied  phenomena  that  designers  encounter. 

Since  we  cannot  really  expect  to  give  complete  descriptions  of  all  design 
requirements  for  complex  design  problems,  how  can  we  expect  to  generate  design 


1  Alexander  defines  a  form  as  a  solution  to  a  design  problem,  and  a  context  outlines  a 
design  problem's  requirements. 
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alternatives  that  satisfy  requirements  that  we  cannot  describe?  Here  lies  the  designer's 
paradox: 

By  designing  an  artifact  that  satisfies  given  requirements,  we  can  identify  further 
requirements  or  more  details  of  the  given  requirements  that  were  unknown  or 
unforeseen  during  the  initial  stages  of  our  design. 

In  other  words,  the  context  of  a  design  and  the  design's  form  are  complimentary.  For 

innovative  design  cases,  prototyping  and  simulation  are  important  design  tools  since  they 

let  us  explore  both  a  design's  form  and  context.  Other  ways  to  explore  a  design's  context 

are  to  take  incremental  steps  towards  satisfying  a  design  artifacts  known  requirements. 

This  allows  for  a  careful  and  critical  review  of  current  design  requirements  and  facilitates 

"stepping  back"  to  previous  design  states  when  new  or  more  detailed  requirements  and 

goals  are  discovered.  Some  researchers  refer  to  this  as  a  redesign  stage,  which  can  be  a 

costly  process. 

Typically,  a  design  starts  as  a  problem  statement  containing  one  or  more  abstract 

requirements.  Abstract  requirements  are  typically  fuzzy  or  incomplete  descriptions  of 

design  criteria  that  most  often  do  not  adequately  portray  the  intentions  of  a  design 

problem  in  the  sense  that  they  are  explicit  by  referencing  specific  design  values.  Therefore, 

abstract  requirements  must  be  transformed  into  more  detailed  ones  before  continuing  onto 

further  design  steps.  Alexander  states: 

Physical  clarity  cannot  be  achieved  in  a  form  until  there  is  first  some  programmatic 
clarity  in  the  designer's  mind  and  actions;  and  that  for  this  to  be  possible,  in  turn,  the 
designer  must  first  trace  his  design  problem  to  its  earliest  functional  origins  and  be 
able  to  find  some  pattern  in  them.  [Alexander64,  page  15] 

It  is  important  to  note  that  Alexander  does  not  require  a  complete  transformation 
of  abstract  requirements  into  detailed  design  goals.  If  we  could  identify  and  transform 
every  abstract  requirement  into  a  set  of  detailed  design  goals,  we  could  formulate  a 
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numerical  optimization  problem.  A  design's  context,  no  matter  how  hard  we  try  to  define 
it,  is  a  field  problem  in  which  we  have  some  forces  that  are  too  difficult  to  understand. 
Another  way  to  state  the  designer's  paradox  is  that  understanding  a  design's  context  is  the 
same  problem  as  synthesizing  an  artifact  that  does  not  violate  that  context.  For  complex 
design  problems,  the  designer's  paradox  becomes  even  more  difficult  since  we  cannot 
always  understand  the  context  without  violating  it. 

If  we  return  to  the  concept  of  satisfying  requirements,  an  underlying  motivation  for 
satisfaction  arises  from  the  previous  discussion  on  the  nature  of  design  requirements. 
Because  we  can  rarely  completely  describe  a  design's  context  and  requirements,  what  is 
meant  by  satisfaction?  What  does  Alexander  mean  by  meeting  requirements  in  the  best 
possible  way?  The  answer  lies  not  in  identifying  what  is  good  but  in  recognizing  what  is 
not  bad.  Since  we  cannot  fully  understand  a  design's  context  but  we  can  recognize  if  an 
artifact  does  not  violate  what  we  do  understand,  then  a  design  that  does  not  violate  any 
design  requirements  (misfit  between  form  and  context)  satisfies  those  requirements  and  is 
a  good,  acceptable  design  solution.  Obviously,  if  we  could  maximize  the  achievement  of 
each  design  goal,  then  we  could  logically  say  that  the  resulting  design  artifact  not  only 
satisfies  identified  design  requirements  but  is  also  the  best  design.  Thus,  satisfying  design 
requirements  is  simply  to  prevent  misfit  between  form  and  context. 

Not  only  do  designers  concern  themselves  with  misfit  between  form  and  context, 
but  they  also  must  contend  with  design  requirements  that  often  conflict.  This  is  a 
characteristic  of  design  problems  that  make  design  decisions  often  a  trade-off.  Instead  of 
making  independent  decisions  by  selecting  the  best  subsystems  to  prevent  misfit,  we  must 
consider  the  interactions  within  a  design  domain's  context  and  between  each  subsystem 
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that  makes  up  the  design.  Interrelated  and  conflicting  design  requirements  are  a  driving 
force  behind  studying  design  theory,  automating  design,  and  creating  computational 
design  process  models  because  they  may  help  designers  create  better  designs  in  a 
complicated,  ever  changing  environment. 

As  an  example  of  abstract,  coupled,  conflicting  requirements,  consider  the  design 
of  a  simple  beam  to  safely  and  economically  resist  a  load.  From  this  statement  we  can 
identify  the  primary  design  requirement  of  kinematically  resisting  a  load.  Any  beam  we 
design  will  suffice  if  it  will  resist  the  given  load  in  a  stable  fashion.  This  is  the  basic 
performance  requirement  for  our  design.  Considering  the  two  additional,  implied 
requirements,  we  have  in  total  three  abstract  requirements:  load  resistance,  economy,  and 
safety,  which  help  define  the  context  of  this  design.  Each  of  these  requirements  is 
indeterminate  since  we  still  lack  information  and  details  such  as  the  magnitude,  location, 
or  direction  of  the  loading;  we  do  not  have  a  "definition"  of  economical,  nor  do  we  have  a 
"definition"  of  safe. 

Transforming  as  many  abstract  design  goals  as  possible  into  a  set  of  more  detailed 
goals  or  specifications  is  necessary  before  any  design  can  continue,  particularly  when 
determining  interactions  among  design  requirements.  Many  design  problems  require  much 
more  information  about  the  design  situation  than  abstract  requirements  state.  Even  in  this 
simple  example,  we  need  such  information  as  available  support  conditions,  materials,  even 
fabrication  methods  before  we  can  consider  possible  design  solutions.  In  essence,  we  are 
exploring  a  design  context  by  trying  to  define  a  design  search  space  of  possible  design 
artifact  alternatives. 
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For  this  example,  we  will  consider  a  Fixed  Pin  Roller 

vertical  load  at  the  center  of  a  span  of  length,  L; 
we  will  consider  any  combination  of  fixed,  pin, 
and  roller  supports  at  the  ends  of  the  span  as 

shown  in  Figure  3;  we  will  choose  from  constant  Figure  3 :  Support  Types 

circular,  I-shape,  or  channel  cross  sections  as  shown  in  Figure  4.  (Appendix  A  shows  three 
tables  of  cross  section  dimensions  used  in  this  example.)  We  will  not  specify  any  materials; 
however,  we  will  assume  that  any  materials  would  equally  resist  tensile  and  compressive 
forces.  We  will  think  about  economy  by  making  the  beam  as  light  as  possible  (based  on 
normalizing  the  cross-sectional  areas)  and  considering  fabrication  and  maintenance  costs. 
Fabrication  and  maintenance  costs  are  dealt  with  by  assigning  two  cost  factor  types  and 
respective  importance  multipliers.  The  first  cost  factor  considers  the  type  of  each  (shown 
in  parenthesis  below  each  support  in  Figure  3)  The  second  cost  factor  takes  into  account 
matching  the  type  of  section  chosen  to  the  type  of  support.  Table  1  shows  a  matrix  of 
fabrication  and  maintenance  costs  for  each  section  and  support  type.  Each  of  these  cost 
factors  (weigh,  fabrication,  and  maintenance)  can  be  scaled  by  an  importance  multiplier, 
which  adjusts  the  relative  influence  of  these  costs  with  respect  to  the  other  requirements. 
The  scaled  cost  factors  will  be  added  together  based  on  the  number  of  supports  to  get  a 
total  cost  estimate.  Our  safety  requirement  will  be  transformed  into  low  displacement  and 
low  stress  goals.2 


2  It  is  important  to  recognize  that  the  more  we  define  a  design's  context,  the  forms  that  fit 
that  context  also  become  more  defined. 
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Figure  4:  Cross  Section  Types 
Table  1 :  Support/Cross-Section  Costs 
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Each  abstract  requirement  has  been  transformed,  in  some  sense,  to  a  more  detailed 
design  goal  as  summarized  in  Figure  5.  This  transformation  is  an  important  step  as  we 
explore  the  design  context.  The  transformation  is  based  on  an  interpretation  of  what  we 
are  trying  to  achieve  and  our  design  experience.  Experiential  knowledge  is  a  human 
designer's  greatest  asset  and  is  what  most  computational  models  for  design  processes 
attempt  to  codify  and  emulate.  An  additional  feature  of  our  detailed  design  goals  is  that 
we  can  estimate  the  level  of  achievement  of  each  in  some  way.  Stability  is  easily 
determined  from  statics  and  is  Boolean,  true  or  false;  weight  is  based  on  the  total  volume 
of  material;  fabrication  and  maintenance  costs  are  based  on  the  values  given  in  Table  1  and 
Figure  3.  Solid  mechanics  lets  us  measure  the  magnitudes  of  displacement  and  stress  and 
thus  rank  each  design  alternative. 
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Designers,  whether 
human  or  machine,  must 
address  each  design 
requirement  to  some  degree 
in  order  to  generate  "good" 
designs;  however,  many 
design  requirements 


Abstract  Requirements 
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Economy 


Detailed  Design  Goals 
->    Stability 


Weight 
->    Fabrication 
Maintenance 


"^    Displacement 


Safety/Comfort 


Stress 
Figure  5:  Transformation  of  Design  Goals 


conflict.  Opposing  requirements  are  those  that  by  increasing  performance  relative  to  one 
goal  results  in  one  reducing  the  level  of  achievement  of  another  goal.  Interactions  between 
requirements  make  them  harder  to  achieve  than  if  they  are  independently  considered,  and 
for  complicated,  large  design  problems  where  interactions  between  requirements  are 
prevalent,  designers  need  some  way  to  reduce  this  complexity. 

Figure  6  shows  an  interaction  diamond  for  the  detailed  design  goals  from  Figure  5 
(stability,  displacement,  stress,  weight,  maintenance,  and  fabrication).  Each  line  in  Figure  6 
represents  an  interaction,  and  the  type  of  interaction  is  shown  as  a  minus  sign  (-)  for  a 
conflict  and  a  plus  sign  (+)  for  mutual  benefit  or  no  conflict.  Each  goal  interacts  with 
every  other  goal  but  the  relative  influence  between  goals  varies.  Figure  6  only  shows 
generalized,  qualitative  goal  interactions  and  is  based  on  the  previously  described 
performance  measurements  for  each  requirement. 

As  an  example  of  interpreting  this  diagram,  reducing  both  stress  and  displacement 
has  a  beneficial  effect  on  the  basic  performance  requirement  of  stability  along  with  the 
reductions  benefiting  each  other.  Reducing  either  maintenance  or  fabrication  cost 
requirement,  however,  tends  to  have  a  conflicting  effect  on  the  other  three  goals  such  that 
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stress  and  displacement  will  likely  increase  and  overall  stability  (in  a  qualitative  sense)  will 
probably  decrease.  Even  for  this  simplified  design  problem,  tradeoffs  are  apparent,  and  in 
general,  design  requirements  will  more  often  conflict  than  what  we  can  observe  in  this 
problem. 
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Figure  6:  Goal  Interaction  Diamond 

As  an  example  of  interpreting  this  diagram,  reducing  both  stress  and  displacement 
has  a  beneficial  effect  on  the  basic  performance  requirement  of  stability  along  with  the 
reductions  benefiting  each  other.  Reducing  either  maintenance  or  fabrication  cost 
requirement,  however,  tends  to  have  a  conflicting  effect  on  the  other  three  goals  such  that 
stress  and  displacement  will  likely  increase  and  overall  stability  (in  a  qualitative  sense)  will 
probably  decrease.  Even  for  this  simplified  design  problem,  tradeoffs  are  apparent,  and  in 
general,  design  requirements  will  more  often  conflict  than  what  we  can  observe  in  this 
problem. 
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An  advantage  of  illustrating  these  concepts  through  a  simple  example  is  that  we 
can  identify  possible  classes  of  solutions,  given  the  design  requirements  that  we  are 
considering.  Satisfying  our  basic  performance  requirement  of  stability  within  the  context 
of  available  support  types  and  locations  defines  four  classes  of  possible  solutions,  a  simply 
supported  beam,  a  cantilevered  beam,  and  two  types  of  indeterminate  beams  as  shown  in 
Figure  7. 

Another  advantage  of  our  simple  problem 

(a)     £  D 

and  how  we  have  developed  its  context  is  that  we 

can  represent  the  displacement,  stress,  and  cost  (b>  || 

requirements  with  some  quantity.  To  compare  each 


solution,  we  can  aggregate  the  level  of  achievement 
of  each  of  the  requirements  into  a  single 

(d) 

performance  index.  We  will  do  this  by  normalizing 
the  level  of  achievement  of  each  requirement,  then        Figure  7:  Possible  Beam  Solutions 
scaling  the  resulting  number.  For  each  class  of  beam  solution,  we  can  derive  the  maximum 
displacement  and  moment  magnitudes  from  solid  mechanics  and  use  the  cross-sectional 
area  as  a  weight  indicator  since  all  solutions  must  span  the  same  distance.  We  will 
normalize  these  magnitudes  with  the  minimum  values  determined  or  provided.  Both 
normalizations  are  done  across  all  cross  sections  as  defined  in  Appendix  A.  We  want  to 
minimize  the  total  cost  (e.  g.,  maximize  the  performance);  however,  because  of  the 
available  cross  section  disparity,  we  also  can  apply  scale  factors  to  the  weight,  fabrication, 
and  maintenance  cost  to  emphasize  their  importance.  Otherwise,  the  magnitudes  of  the 
displacement  and  bending  values  will  overpower  the  other  requirements.  We  then  can  add 
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these  values  to  the  costs  based  on  weight,  cross-section  type,  and  support  type.  The 
resulting  numbers  are  the  relative  costs  that  we  want  to  minimize.  Thus  when  choosing 
among  a  number  of  solutions,  we  will  be  confident  that  our  choice  will  satisfy  to  a  high 
degree  the  specified  design  requirements.  These  advantages  are  not  normally  present  in 
complex  preliminary  design  problems;  however,  we  will  use  them  to  illustrate  how  design 
requirements  interact.3 

Appendix  A  shows  through  several  design  scenarios  that  by  emphasizing  different 
combinations  of  design  requirements,  designers  cannot  ignore  interactions  between 
requirements.  By  concentrating  on  increasing  the  design's  performance  with  respect  to 
stress  and  displacement,  we  in  essence  disregard  the  other  requirements  by  selecting  a 
propped  beam  with  a  large  I-section.  This  solution  drastically  differs  from  when  we 
accentuate  either  manufacturing  or  fabrication  requirements,  which  indicates  that  a  large 
sectioned  cantilevered  beam  would  be  best.  Overall,  it  is  not  surprising  that  I-sections  are 
generally  preferred  since  they  resist  bending  most  efficiently.  Both  fixed  end  and 
cantilevered  beams  are  often  chosen  where  the  choice  is  a  tradeoff  between  safety  and 
economy. 

This  example  directly  illustrates  several  concepts.  First,  by  identifying  and  then 
mapping  abstract  design  requirements  into  detailed  design  goals,  we  further  define  a 
design's  context,  which  in  turn  helps  define  a  design's  form.  In  general,  identification  and 


3  This  design  problem,  as  specified,  resembles  more  of  a  structural  optimization  problem 
rather  than  a  preliminary  design  problem  because  we  have  specified  to  a  high  degree  the 
design's  context.  The  formula  for  calculating  the  performance  index  would  be  the 
objective  function  and  the  limits  to  each  design  requirement  a  constraint  function.  By  not 
directly  solving  this  problem  using  optimization  techniques,  we  resolve  this  design 
problem  more  in  the  spirit  of  preliminary  design. 
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transformation  of  design  requirements  are  not  as  straightforward  as  in  our  simple  example. 
Mapping  from  abstract  requirements  to  more  detailed  ones  may  be  regarded  as  a  process 
that  combines  "discovery"  of  the  unexpected  with  anticipation  of  what  is  known  about  a 
design  domain.  Those  requirements  that  designers  discover  during  refinement  most  often 
prescribe  a  redesign  phase  based  on  an  analysis  of  a  design's  failures.  Since  designers 
rarely  can  completely  define  a  design's  context,  they  must  experiment  with  its  form  to 
clarify  the  context,  thus  recognizing  new,  unanticipated  requirements.  This  symbiosis 
(Figure  8)  is  characteristic  of  most  design  problems. 

Second,  design  requirements  interact  to  a  varying 

Context 

degree.  Complexity  of  design  problems  increases 


dramatically  during  the  design  process  as  refinement  ^^$>  Fnrm 

defines  requirements  and  interactions  between  Figure  8:  Symbiosis  of  Design 

requirements.  These  interactions  are  relationships  between  design  goals  that  either  inhibit 
or  assist  in  satisfying  those  goals.  Identifying  interactions  and  effectively  dealing  with  them 
are  some  of  the  most  challenging  aspects  of  design  in  general  and  in  developing 
computational  models  of  design. 

Third,  by  formalizing  context,  form,  and  goal  interactions,  both  design  variables 
and  parameters  are  more  easily  identified.  Design  variables  and  parameters  define  specific 
quantitative  and  qualitative  features  of  a  design  that  have  the  potential  for  adding  requisite 
detail  to  the  artifact.  When  a  design  process  has  completely  specified  an  artifact,  all  design 
variables  and  parameters  have  values  within  a  valid  range  for  the  given  design  context,  and 
the  design  process  may  stop.  What  constitutes  a  proper  value  for  a  given  design  variable 
goes  back  to  Alexander's  satisfying  concept  presented  earlier  in  this  chapter.  Providing 
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that  the  value  for  a  given  design  variable  does  not  violate  either  the  context  or  form  of  the 
design,  then  the  value  for  that  design  variable  is  acceptable. 

In  summary,  design  is  a  process  that  synthesizes  an  artifact  that  functions  to  attain 
some  specified  requirements.  To  accomplish  this  task,  any  design  process  model,  whether 
cognitive  or  computational,  must  in  some  way  do  the  following: 

•  conform  to  abstract  design  requirements, 

•  map  abstract  requirements  to  detailed  requirements, 

•  identify  and  accommodate  interacting  requirements,  and 

•  satisfy  design  requirements. 

The  next  section  of  this  chapter  reviews  existing  models  of  design  processes.  From 

this  review,  a  case  for  investigating  and  developing  connectionist  models  for  design 
processes  will  be  made. 

Models  of  Design  Processes 
Models  of  design  processes  fall  into  three  general  categories:  descriptive, 
prescriptive,  and  computational.  Although  this  research  focuses  on  development  of 
computational  models,  research  in  both  descriptive  and  prescriptive  models  is  necessary  to 
help  identify  those  areas  of  design  research  and  methodology  that  are  well  understood  and 
those  that  are  not.  Design  researchers  developing  and  studying  descriptive  and  prescriptive 
models  base  and  direct  their  work  on  and  towards  human  designers.  These  models 
oftentimes  provide  the  basis  for  computational  design  process  models  and  are  necessary  to 
review. 
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Prescriptive  Design  Process  Models 

Prescriptive  models  can  dictate  either  how  the  design  process  should  proceed  or 
what  attributes  the  design  artifact  ought  to  have.  To  date,  these  two  classes  of  prescriptive 
models  are  disjoint  and  have  little  in  common,  other  than  proposing  a  prescription  on  the 
philosophies  of  design.  Most  books  on  design  processes  offer  prescriptive  models 
[Cross89,  Dieter83,  Ertas93,  Pahl84,  Pugh90,  Ullman92,  Walton91]  that  define  a  strategy 
for  the  design  of  a  quality  product.  The  authors  attempt  to  enlighten  engineering  students 
with  better  ways  to  approach  design  problems.  It  is  interesting  to  note  that  one  of  the 
common  themes  that  these  books  stress  is  that  reading  about  design  is  not  enough  but  the 
student  must  actually  do  design  in  order  to  become  proficient  at  the  process.  Design 
experience  is  a  necessary  ingredient  in  design  education.  This  theme  is  a  common  thread 
among  all  design  researchers. 

Prescriptive  design  processes  describe  a  plan  for  how  to  get  from  the  need  for  an 
artifact  to  the  final  product.  This  plan  ideally  best  utilizes  the  knowledge  at  hand  so  that 
the  artifact  is  of  high  quality  and  is  quickly  and  economically  developed.  Ullman 
[Ullman92]  identifies  the  product  life  cycle  that  is  the  basis  for  the  mechanical  design 
process.  The  product  life  cycle  he  prescribes  consists  of  six  phases: 

1 .  Specification  development/planning  —  understanding  the  design  problem  and 
planning  for  design. 

2.  Conceptual  design  ~  generating  and  evaluating  concepts. 

3.  Product  design  ~  generating  the  product,  evaluating  the  product,  and  finalizing  the 
product. 

4.  Production  —  manufacturing  and  assembling  the  product. 

5.  Service  —  maintaining  the  artifact  while  in  use. 

6.  Product  retirement  ~  disposing  of  the  product  after  its  design  life  is  completed. 
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Concurrent  design  is  the  simultaneous  evolution  of  the  product  and  the  process  for 
producing  it.  The  process  of  refining  a  design  from  a  concept  to  a  manufacturable  product 
is  done  concurrently  in  order  to  increase  the  quality  of  the  resulting  artifact  and  make  its 
manufacture  as  economical  and  efficient  as  possible.  Although  these  six  phases  are 
described  in  an  order  that  reflects  a  sequential  series  of  actions,  they  are  often  concurrently 
done  among  a  design  team.  Each  phase  of  this  design  process  strategy  must  be 
investigated  in  the  context  of  the  overall  design  process.  Ullman  estimates  that  75  percent 
of  an  artifact' s  manufacturing  cost  is  committed  by  the  end  of  the  conceptual  design  phase 
and  the  product  design  phase  only  consumes  25  percent.  This  means  that  design  decisions 
made  later  in  the  design  process  will  have  little  affect  on  the  product's  manufacturing  cost. 

Prescriptive  methods  can  account  for  the  outcome  of  many  possible  artifacts  that 
satisfy  the  requirements  since  the  knowledge  employed  during  the  design  process  is 
designer  dependent.  This  implies  a  dynamic  process.  Designers  apply  two  types  of 
knowledge:  design  process  knowledge  and  domain  knowledge.  Domain  knowledge 
consists  of  information  that  is  directly  and  indirectly  associated  with  design  requirements, 
variables,  terminology,  procedure,  processes,  etc.  that  make  up  a  design's  context.  Human 
designers  constantly  learn  more  about  their  domain  while  practicing  design.  Since  domain 
knowledge  is  so  important  to  prescriptive  models,  designers  rely  on  experience  to  gain 
domain  knowledge.  Design  process  knowledge  affects  how  the  designer  applies  domain 
knowledge,  and  prescriptive  design  models  provide  a  strategy  for  applying  that 
knowledge.  Just  as  different  designers  create  different  designs  that  satisfy  the  same  design 
requirements,  they  also  employ  knowledge  about  the  domain  and  design  process  at 
different  times  and  perhaps  in  a  slightly  different  manner. 


29 

An  argument  can  be  made  that  prescriptive  processes  limit  creativity  in  design; 
instead,  design  processes  should  be  viewed  as  chaotic.  On  the  other  end  of  the  spectrum  is 
the  belief  that  design  processes  should  be  organized  and  disciplined.  What  most  design 
researchers  have  proposed  to  date  are  prescriptive  models  that  fall  in  the  middle  of  the 
spectrum.  Software  engineering  is  an  example  domain  that  employs  a  strict  prescriptive 
process  model  [Conger94,  Rumbaugh91],  and  in  this  domain,  a  strict  model  performs 
well.  Software  engineers  are  able  to  define  the  context  with  great  detail  before  conceptual 
design  and  product  design  phases  begin  since  designers  can  specify  requirements  with 
great  detail.  In  addition,  the  targeted  computing  environment  can  partially  define  the  form 
of  a  software  product  before  software  engineers  consider  the  conceptual  design. 
Therefore,  software  engineering  inherently  allows  for  strict  prescribed  design  processes, 
whereas  some  engineering  design  domains,  particularly  preliminary  structural  design, 
suffer  from  the  designer's  paradox,  which  allows  and  sometimes  requires  an  exploration  of 
the  design  domain  to  gather  further  knowledge. 

As  knowledge  about  a  design  problem  increases,  designers  make  decisions  that 
define  the  form  of  an  artifact.  As  a  form  becomes  determined,  there  is  less  unbridled 
creativity  since  the  emerging  form  adds  constraints  to  the  as  yet  undetermined  aspects  of 
the  artifact.  Violation  of  these  constraints  implies  expense  in  terms  of  design  changes  that 
may  affect  re-tooling  of  manufacturing  resources  and  delay  in  releasing  the  product  to 
market.  Changes  to  a  product  early  in  the  design  process  incur  little  expense  since  only  a 
small  investment  has  been  made;  therefore,  designers  have  more  opportunity  for  creativity 
while  fewer  design  commitments  have  been  made. 


30 

The  conceptual  design  phase  in  most  prescriptive  processes  generates  and 
evaluates  possible  preliminary  solutions  to  a  design  problem.  The  generation  and 
evaluation  of  alternatives,  as  opposed  to  a  single  solution,  is  desirable  since  designers 
rarely  have  a  design's  context  completely  formalized.  Again,  the  designer's  paradox 
dictates  that  we  will  gain  more  knowledge  about  our  design  domain  (and  design  process) 
by  exploration.  Generating  and  evaluating  alternatives  are  such  an  exploration  [Pugh90]. 

A  very  successful  prescriptive  design  process  model  that  describes  the  attributes 
that  the  artifact  should  possess,  rather  the  prescribing  the  process  to  attain  an  artifact,  was 
developed  by  Genichi  Taguchi  [Roy90].  The  Taguchi  method,  as  his  prescriptive 
processes  have  come  to  be  known,  relies  on  the  relationship  between  design  and 
manufacturing.  The  revitalization  of  Japanese  manufacturing  since  World  War  II  can  in 
part  be  attributed  their  adoption  of  Taguchi  methods,  which  strive  to  minimize  the  quality 
loss  of  a  design  over  the  design's  life. 

Taguchi  defines  quality  loss  as  the  deviation  from  desired  performance  of  a  design. 
The  system  that  Taguchi  developed  is  statistical  in  nature  and  relies  on  the  development  of 
experiments  for  parameter  and  tolerance  design.  While  Alexander  [Alexander64]  defines  a 
good  design  as  one  that  satisfies  all  design  requirements,  Taguchi  defines  a  good  design  as 
one  that  is  relatively  insensitive  to  uncontrollable  factors  that  might  be  encountered  in 
manufacturing  and  during  the  life  of  the  product.  Taguchi  calls  these  factors  that  can  cause 
a  design's  functional  criteria  to  deviate  from  expected  values  "noise  factors." 

Taguchi  methods  emphasize  designing  quality  into  not  only  an  artifact,  but  also  the 
processes  that  manufacture  an  artifact.  Taguchi  based  his  methods  on  three  ideas: 
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1 .  Quality  should  be  designed  into  a  product  and  not  inspected  into  it  during 
manufacture. 

2.  Quality  is  best  achieved  by  minimizing  the  deviation  of  design  parameters  from 
target  values. 

3.  The  cost  of  quality  loss  can  be  measured  as  a  function  of  design  parameter 
deviations  in  terms  of  the  overall  life  cycle  of  a  product. 

Quality  improvement  requires  continuous  effort  to  reduce  the  variation  around  target 
values,  and  it  is  very  important  that  this  effort  be  done  early  in  a  design  process  in  order 
for  Taguchi  methods  to  succeed.  The  first  step  towards  improving  quality  is  to  statistically 
reduce  the  deviation  as  much  as  possible.  To  accomplish  this,  Taguchi  methods  call  for 
designing  experiments  using  tables  called  "orthogonal  arrays"  that  serve  to  determine  the 
least  number  of  experiments  and  their  conditions.  The  second  step  to  improving  product 
quality  is  to  reduce  noise  factors  that  can  adversely  influence  the  response  of  an  artifact. 
Machinery  wear  and  weather  are  examples  of  what  Taguchi  considers  as  noise  factors. 
Taguchi  uses  "outer  arrays"  to  study  the  influence  of  noise  factors  with  minimum  effort. 

To  achieve  desirable  product  quality  through  design,  Taguchi  suggests  a  three 
stage  design  process: 

1.  Systems  design,  which  focuses  on  identifying  a  design's  context  and  determining 
valid  ranges  of  design  variables  and  requirements  for  both  a  product  and 
manufacturing  process.  This  may  include  materials,  process  parameters,  and 
possible  configurations. 

2.  Parameter  design,  which  seeks  to  determine  values  of  design  variables  that 
produce  the  best  performance  of  an  artifact  and  manufacturing  process. 

3 .  Tolerance  design,  which  fine  tunes  the  results  of  parameter  design  by  tightening 
tolerance  factors  in  order  to  reduce  the  variance  of  the  product  and  manufacturing 
process  with  respect  to  requirements. 

Taguchi  methods  require  designing  experiments  to  identify  acceptable  design 
factor  levels.  Instead  of  setting  up  an  exhaustive  set  of  experiments,  Taguchi  uses 
orthogonal  arrays  to  minimize  the  number  of  required  experiments.  Each  row  of  an 
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orthogonal  array  represents  a  trial  while  each  column  corresponds  to  the  design  factors 
specified  for  the  process.  As  an  example,  if  a  design  has  seven  factors  that  can  take  on  one 
of  two  values,  then  a  complete  exhaustive  search  of  the  space  would  require  2  or  128 
experiments.  Using  the  orthogonal  array  method,  Taguchi  can  specify  eight  design 
experiments  that  adequately  explore  the  design  space. 

The  results  of  these  experiments  can  achieve  one  or  more  of  the  following 
objectives.  First,  they  can  establish  an  optimum  set  of  values  for  design  parameters. 
Second,  they  can  estimate  the  contribution  of  individual  factors  to  the  quality  of  a  design. 
Third,  they  can  estimate  the  response  of  a  design  with  respect  to  the  desired  response 
level. 

The  best  results  using  Taguchi  methods  occur  in  industries  characterized  by  a  high- 
volume,  low-cost  manufacturing  environment,  such  as  consumer  electronics  or 
automobiles.  In  these  environments,  the  cost  of  a  large  number  of  experiments  is  offset  by 
the  revenue  generated;  however,  low-volume,  high-cost  manufacturing  environments  such 
as  the  aerospace  industry  may  not  be  readily  adaptable  to  Taguchi  methods  since  the  cost 
of  experiments  is  high  [Montgomery*?  1]. 

Descriptive  Design  Process  Models 

Descriptive  design  process  models  differ  from  prescriptive  models  by  attempting  to 
describe  how  human  designers  design  instead  of  suggesting  a  strategy  for  designers  to 
follow.  They  study  the  cognitive  processes  designers  use.  Descriptive  models  fall  into  two 
categories.  The  first  category  uses  protocol  studies  to  investigate  how  designers  design, 
and  the  second  category  creates  cognitive  models  that  are  similar  in  some  way  to  the 
mental  processes  used  by  human  designers.  How  humans  design  is  a  question  that  has  no 
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simple  answer;  it  is  a  human  characteristic  that  is  unique  and  not  well  understood. 
Attempting  to  answer  the  question  of  how  we  design  is  in  a  large  part  the  primary 
motivation  behind  many  researchers  studying  cognitive  modeling.  Their  results  are  often  in 
the  form  of  a  descriptive  design  process.  This  section  reviews  work  done  in  developing 
descriptive  design  process  models. 

Protocol  analysis  involves  presenting  a  subject  with  a  problem  and  asking  the 
subject  to  verbalize  while  solving  the  problem.  Protocol  analysis  tries  to  obtain  a  subject's 
stream-of-consciousness  thoughts.  Typically,  protocol  analysis  requires  some  type  of 
recording  of  protocol  sessions,  either  with  sound  only  or  with  both  sound  and  video 
recordings.  By  analyzing  design  protocol  sessions,  it  is  possible  to  identify  some  of  the 
mental  processes  designers  use  during  design.  A  primary  feature  of  protocol  analysis  is 
that  it  records  all  the  information  a  subject  can  communicate,  whether  verbally  or  through 
drawings  or  gestures. 

There  are  several  inherent  weaknesses  with  protocol  analysis.  Considering  how 
little  we  know  about  design  processes,  the  benefits  of  protocol  analysis  far  out  weight  the 
drawbacks.  The  first  limitation  is  in  interpretation  of  protocol  sessions.  Most  researchers 
limit  any  bias  by  having  more  than  one  person  interpret  each  session  and  then  negotiate 
differences  between  explanations.  Protocol  analysis  of  designers  is  limited  by  studying 
individuals  because  design  is  generally  a  team  effort  involving  more  than  one  designer.  To 
date,  no  work  has  been  done  on  applying  protocol  analysis  to  design  teams.  Finally,  the 
weakest  aspect  of  protocol  analysis  is  that  it  cannot  record  what  happens  when  a  designer 
chooses  not  to  think  aloud  or  when  a  subject  mulls  over  a  problem.  Most  engineering 
design  problems  studied  using  protocol  analysis  require  more  time  than  is  typical  of  other 
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disciplines  that  use  protocol  analysis;  therefore,  data  gathering  is  divided  into  several 
sessions  over  the  course  of  several  days  allowing  designers  to  ponder  and  organize  their 
thoughts  outside  of  formal  data  gathering  sessions.  Regardless  of  the  limitations,  design 
researchers  have  discovered  many  interesting  design  "habits"  that  are  consistent  across 
several  design  domains  and  appear  to  be  inherent  human  traits. 

Ullman  and  coworkers  [Ullman87,  88]  have  done  extensive  protocol  studies  of 
mechanical  designers.  They  chose  to  study  all  phases  of  human  design  processes  with  both 
experienced  and  inexperienced  engineering  designers.  Designers  were  recorded  using  both 
video  and  sound  recording  equipment  during  their  entire  design  session,  which  started  by 
providing  subjects  with  abstract  design  specifications  and  proceeded  until  they  produced 
detailed  working  drawings  for  most  of  a  final  design.  The  protocol  analysis  recorded  two 
types  of  design  problems.  One  type  was  the  design  of  a  mass-produced  product  and  the 
other  required  designing  a  unique  one-of-a-kind  product.  The  resulting  protocol  analysis 
contains  descriptions  of  the  form  and  function  of  the  resulting  artifacts  and  also  the 
designer's  process. 

The  results  of  different  protocol  analyses  are  similar  [Adelson88,  Ullman87, 
Ullman88].  The  major  findings  are  as  follows: 

•  Individual  designers  establish  a  single  preliminary  design  early  in  their  process.  As 
they  discover  problems  or  inefficiencies  in  their  original  concept,  they  fix/patch  a 
design  instead  of  formulating  a  different  preliminary  design.  This  single-concept 
strategy  is  contrary  to  what  most  prescriptive  process  models  advocate;  however, 
since  these  protocol  studies  are  of  single  designers,  not  design  teams,  it  is 
reasonable  to  assume  that  individual  designers  cannot  mentally  develop  parallel, 
preliminary  concepts. 

•  Designers  make  extensive  use  of  both  mental  and  written  notes.  Written  notes  are 
particularly  prevalent  when  determining  a  geometric  form  of  an  artifact.  Notes  and 
drawings  can  act  as  informal  evaluation  tools  where  dimensions  and  values  can  be 
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calculated.  They  also  may  provide  a  visual  stimulation  for  ideas,  provide  external 
memory,  and  facilitate  communication  of  ideas  to  others. 

•  The  strategy  designers  employ  changes  from  being  systematic  to  more 
opportunistic  as  a  design  becomes  less  abstract.  Once  designers  have  explored  a 
design's  context  and  provided  some  form  to  an  artifact,  patterns  in  sketches  can 
trigger  ideas  or  identify  problems  causing  designers  to  immediately  shift  their 
attention  to  that  idea  or  problem.  In  addition,  as  a  design  becomes  more  complex, 
it  is  more  difficult  for  a  designer  to  consider  the  whole  design.  Instead,  designers 
subdivide  a  problem  into  manageable  parts,  which  become  items  of  focus. 
Prescriptive  design  processes  suggest  a  more  systematic  process  rather  than  this 
type  of  opportunistic  behavior. 

•  Designers  try  to  keep  the  state  of  a  design  balanced  by  focusing  on  abstract 
portions.  This  general  strategy  of  keeping  all  parts  of  a  design  at  the  same  level  of 
detail  contradicts  the  opportunistic  strategy  previously  mentioned;  however,  if  a 
design  state  becomes  bottlenecked,  then  the  convenience  of  what  can  be  achieved 
may  alleviate  the  standstill. 

Protocol  studies  can  be  valuable  tools  for  studying  design  processes  humans  use 
and  for  testing  prescriptive  design  process  models.  They  should  be  extended  to  include 
design  teams  rather  than  just  studying  individuals. 

A  cognitive  model  describes  the  processes  and  behaviors  that  constitute  a  skill.  A 
cognitive  model  specifies  a  set  of  mechanisms,  each  with  a  well  defined  function  and 
defined  interactions,  that  transform  a  set  of  inputs  into  outputs.  Since  a  cognitive  model 
describes  a  process  by  employing  well  defined  mechanisms,  it  can  generate  explanations 
and  predictions  about  the  process  that  it  models  [Adelson89].  This  is  a  useful  feature  for 
studying  the  theory  of  processes  such  as  design  that  are  not  well  understood.  Theorists 
can  develop  cognitive  models  from  protocol  studies;  however,  retrospective  reporting, 
where  a  subject  explains  what  was  done,  and  informal  reporting,  where  an  observer 
watches  what  is  done  and  asks  questions,  can  also  provide  the  basis  for  cognitive  models. 
Protocol  studies  are  particularly  useful  in  cognitive  research  and  in  developing  cognitive 
models  for  the  following  reasons  [Adelson89]: 
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•  Protocol  studies  can  help  identify  complex,  interacting  behaviors. 

•  Protocol  analysis  is  well  suited  to  test  and  give  support  to  the  explanations 
generated  by  a  cognitive  model. 

•  Protocol  studies  are  normally  done  in  a  natural  setting  that  can  protect  results  from 
any  skewing  that  might  occur  in  an  experimental  setting. 

Regardless  of  the  foundation  for  a  cognitive  model,  most  researchers  recognize 
that  cognitive  models  deserve  further  investigation  [Mostow85].  Research  into  a  cognitive 
theory  of  design  is  just  beginning  and  suffers  from  a  lack  of  design  theory  taxonomy  in 
which  to  study  and  in  the  contrasting  approaches  used  by  researchers  in  computer  science 
and  those  in  psychology  [Dixon88],  As  cognitive  modeling  and  protocol  analysis 
continues,  the  methods,  skills,  and  strategies  employed  by  designers  will  be  better 
understood.  Different  computational  design  process  models  test  many  aspects  of  most,  if 
not  all,  cognitive  modeling  [Adelson88,  Adelson89,  Brown86,  Mitchell85,  Tong87].  This 
is  the  subject  of  the  next  section. 

Computational  Design  Process  Models 

Using  prescriptive  and  descriptive  models  of  how  designers  design,  many 

researchers  have  developed  computational  models  as  tools  to  assist  designers,  as 

autonomous  systems,  or  as  experiments  to  research  how  designers  design  or  should 

design.  All  three  types  of  process  models  contribute  to  the  overall  understanding  of  design 

processes.  Each  of  these  types  of  models  helps  in  identifying  abstractions,  variables, 

techniques,  and  general  knowledge  about  design.  Although  there  is  no  requirement  that  a 

design  process  model  need  design  as  people  do,4  protocol  studies,  cognitive  research,  and 


4  Taguchi  did  not  base  his  methods  on  how  people  design,  but  prescribed  a  strategy  for 
human  designers  to  follow. 
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process  prescriptive  models  study  human  designers  as  examples  of  what  computational 
models  may  want  to  emulate.  Computational  models  give  researchers  a  controlled  and 
reasonably  understood  testing  ground  for  their  theories  [Dixon88]. 

Most  computational  design  process  models  map  some  set  of  design  requirements 
into  a  description  of  a  physically  realizable  object  that  satisfies  those  requirements.  As 
previously  illustrated  in  this  chapter,  these  requirements  may  be  very  abstract  or  quite 
detailed;  however,  given  requirements  typically  attempt  to  specify  function,  performance, 
context,  and  available  resources.  The  design  task  performed  by  most  computational 
models  is  to  create  a  design  solution  that  satisfies  these  requirements.  Various  models  take 
advantage  of  special  characteristics  of  their  design  domains,  such  as 

•  Modularity  —  allows  partitioning  design  problems  into  subproblems. 

•  Interactions  —  subproblems  may  not  be  independent,  but  interactions  between 
classes  of  subproblems  may  be  well  defined. 

•  Requirements  —  may  be  specific  or  abstract. 

•  Level  of  Solution  Detail  —  are  detailed  "final"  results  sought,  or  is  an  abstract 
description  expected? 

•  Available  Knowledge  —  some  domains  do  not  have  a  recognized  domain  theory. 
In  most  computational  design  process  models,  particularly  knowledge-based  ones, 

the  form,  quantity,  and  availability  of  design  knowledge  is  critical  and  an  often  overlooked 

factor  in  the  development  of  a  computational  model.  Computational  models  must  in  some 

way  acquire,  organize,  represent,  and  use  a  variety  of  types  of  knowledge.  Oftentimes  this 

knowledge  is  experiential  from  humans;  it  may  be  fundamental  in  the  design  domain;  or  it 

even  may  come  from  the  design  process  as  it  unfolds.  To  complicate  matters,  the  way 

computational  models  represent  and  use  knowledge  may  take  on  different  forms,  from 

production  rules  to  objects  in  an  object-oriented  environment.  There  are  two  important 
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issues  to  note.  First,  the  effective  combination  of  different  types  of  knowledge  is  difficult, 
and  second,  it  is  difficult  to  effectively  represent  many  types  of  knowledge  that  human 
designers  efficiently  use  [Nevill88].  Knowledge  representation  is  a  research  area  within 
itself;  therefore,  this  section  will  not  explicitly  address  different  techniques  of 
representation  but  will  concentrate  on  different  processes  computational  models  may 

apply- 
Top-down  refinement  is  a  general  problem  solving  method  that  starts  with  an 

initial,  abstract  problem  specification  and  refines  it  by  adding  detail.  In  order  to  do  this,  a 

computational  process  may  have  to  decompose  a  problem  into  smaller  subproblems  until 

basic,  primitive  operators  can  fully  specify  each  subproblem.  A  computational  model 

usually  does  this  in  an  iterative  manner.  Top-down  refinement  requires  that  a  design 

domain  have  identifiable  levels  of  abstraction  and  subproblem  classes.  In  addition, 

interactions  between  subproblems  must  also  be  consistent  and  identifiable. 

A  typical  computational  model  may  use  many  levels  of  abstraction.  MOLGEN 

[Stefik80],  a  molecular  genetic  design  system,  uses  six  abstraction  levels,  and  MOSAIC 

[Nevill89a]  uses  three  abstraction  levels.  Each  abstraction  level  a  computational  model 

uses  (and  human  designers,  too)  tends  to  focus  on  a  particular  viewpoint  or  primary  focus 

of  a  design.  For  instance,  the  first  or  top  abstraction  level  in  MOSAIC  is  concerned  with 

developing  a  path  around  obstacles  in  the  design  space.  The  results  from  subproblems  at 

one  level  of  abstraction  are  commonly  transferred  to  other  subproblems  and  abstraction 

levels  to  assist  refining  a  solution.  Both  designers  and  computational  design  models  use 

abstractions  to  simplify  design  problems  by  ignoring  irrelevant  details  with  respect  to  their 

viewpoint. 
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At  each  level  of  abstraction,  computational  models  may  decompose  given  problem 
specifications  into  smaller  subproblems.  Modularity  of  a  design  domain  and  natural 
separability  lead  to  different  subproblem  classes.  Design  models  commonly  use  both 
decomposition  and  abstraction  techniques  to  simplify  design  problems.  By  dividing 
complicated  design  problems  into  smaller  subproblems,  the  number  and  scope  of  design 
decisions  that  need  to  be  made  at  any  point  in  a  design  process  can  be  controlled; 
however,  for  computational  design  models  that  do  not  acquire  new  knowledge,  good 
abstractions  and  decompositions  must  be  known  and  coded  into  a  computer  model  a 
priori.  In  addition,  conflicts  can  arise  between  subproblems. 

In  design  domains,  independent  sets  of  subproblems  are  rare  and  interactions 
between  subproblems  may  result  in  conflicts  or  assistance,  recall  Figure  6  showing 
interacting  requirements.  Alexander  [Alexander64]  suggests  identifying  subproblem 
classes  that  have  a  minimum  number  of  interactions,  but  in  design,  interactions  are 
inevitable  and  computational  models  must  either  have  them  identified  a  priori  or  have 
some  means  of  detecting  them.  A  common  method  of  handling  interactions  is  through  a 
process  called  constraint  propagation.  Whenever  an  interaction  occurs,  a  design  process 
model  would  create  a  constraint  that  communicates  the  interaction.  A  computational 
model  using  constraints  must  propagate  them  to  those  subproblems  that  are  affected. 
Steinberg  [Steinberg87]  notes  that  constraint  propagation  is  effective  but  computationally, 
very  expensive.  When  an  interaction  causes  a  conflict  or  results  in  some  requirement  being 
unsatisfied,  a  computational  model  must  be  able  to  resolve  the  conflict.  Most  models  do 
this  in  two  possible  ways,  backtracking  and  patching  [Tong87].  Backtracking  is  when  a 
model  can  undo  a  design  decision(s)  that  caused  the  conflict.  Undoing  a  design  decision 


40 

typically  causes  an  additional  constraint  to  be  posted  before  the  process  continues.  In 
patching  a  design,  a  computational  model  would  replace  part  of  or  augment  a  design  such 
that  the  patching  action  eliminates  the  conflict.  Patching  does  not  require  that  a  model 
revert  to  a  previous  level  of  abstraction. 

Bottom-up  composition  is  a  design  process  that  is  opposite  of  top-down 
refinement.  Bottom-up  composition  starts  with  a  set  of  design  requirements,  but  instead  of 
refining  a  system  of  subproblems  to  find  a  solution,  a  design  is  constructed  from  a  set  of 
available  components.  The  components  are  combined  in  various  ways,  oftentimes 
exhaustively,  to  create  a  design  that  satisfies  the  requirements.  In  order  to  prevent 
exponential  explosion,  control  structures  guide  the  process  of  building  a  design  from  basic 
components.  Two  interesting  uses  of  bottom-up  composition  have  found  their  way  into 
design  models.  First,  generative  grammars  for  reasoning  about  spatial  and  functional 
representations  have  been  used  with  limited  success  in  structural  design  [Fenves87],  and 
second,  heuristic  rule  sets  that  generate  possible  solutions  to  a  given  set  of  design 
requirements  [Coyne87].  Both  these  methods  suffer  from  a  limited  knowledge  set  and 
process  control  problems. 

Most  knowledge-based  processes  need  some  type  of  control  system  that  guides  a 
process  by  invoking  rules,  identifying  and  resolving  interactions,  pursuing  certain  goals, 
and  evaluating  the  state  of  a  design.  In  general,  a  control  system  for  computational  models 
of  design  invokes  the  specific  knowledge  and  processes  used  to  solve  a  design  problem. 

Human  designers  readily  move  between  abstractions,  create  design  variables,  and 
even  forget  previous  progress  made  on  a  design.  At  the  surface,  these  attributes  do  not 
seem  to  be  completely  advantageous,  but  human  designers  outperform  computational 
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models  in  terms  of  creativity,  innovation,  and  in  dealing  with  unfamiliar  design  situations. 
These  characteristics  of  human  designers  actually  help  in  difficult  design  situations  since 
humans  can  become  opportunistic  in  the  sense  that  they  change  abstractions  and  their 
focus  because  of  recognizable  patterns  in  their  design  [Ullman87,  Baker87].  Humans  can 
identify  patterns  that  cross  abstraction  boundaries  and  even  design  domain  bounds. 
Associating  design  features  for  pattern  recognition  can  even  mean  conveniently  ignoring 
certain  features  in  order  to  become  opportunistic.  Tong  [Tong87]  suggests  that  changes  in 
abstractions  are  necessary  when  a  design  problem  contains  identifiable  bottlenecks.5 
Unfortunately,  opportunistic  behavior  can  appear  to  be  chaotic  and  difficult  to  control  in 
the  sense  that  designers  will  always  create  repeatable  designs.  Humans  constantly  learn 
and  good  designers  depend  on  learning  from  their  design  experiences.  They  cannot  be 
relied  on  to  create  repeatable  designs. 

Computational  models,  on  the  other  hand,  need  some  type  of  controlling  structure 
if  they  are  to  produce  repeatable  designs,  even  if  opportunistic  behavior  can  be  modeled. 
However,  there  is  not  a  single  recognized  procedure  for  guidance  in  design  let  alone 
taking  advantage  of  and  recognizing  opportunity.  Most  computational  models  implement 
some  control  structure  that  provides  some  guidance  in  moving  between  abstractions, 
identifying  critical  design  variables,  and  for  refining  requirements  [Mostow85]. 

Knowledge-based  approaches  in  design  have  taken  several  courses.  Brown 
[Brown86]  developed  a  hierarchy  of  "specialists"  modules,  which  implement  well 


Bottlenecks  occur  when  a  design  process  cannot  continue  even  though  a  design  is  not 
fully  specified.  Conflicting  requirements,  unspecified  design  variables,  and  violated  goals 
are  some  examples  of  conditions  that  might  cause  a  bottleneck  to  occur. 
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understood  design  alternatives.  Design  proceeds  in  a  top-down  fashion  from  the  abstract 
to  the  detailed  level.  Because  the  specialists  modules  are  well  defined  and  understood, 
abstraction  levels  are  clearly  defined  as  are  relationships  between  different  requirements 
and  design  variables  through  and  between  all  abstraction  levels.  Hayes-Roth  and  Hayes- 
Roth  [Hayes-Roth78]  recognized  the  opportunistic  nature  of  human  planning  processes, 
which  was  reinforced  by  protocol  studies  [Ullman87,  Baker87,  Ullman88].  In  their  model, 
Hayes-Roth  and  Hayes-Roth  proposed  a  common  data  area  called  a  blackboard  where 
independent  and  asynchronous  specialists  would  exchange  information.  These  specialists 
are  similar  to  the  specialists  of  Brown  [Brown86];  however,  Hayes-Roth  and  Hayes-Roth 
left  the  issue  of  controlling  this  process  as  a  topic  for  future  research.  Stefik  [Stefik80], 
working  in  conjunction  with  the  Hayes-Roths,  developed  a  hierarchical  planning  system 
for  molecular  genetics.  Stefik' s  work  dealt  with  handling  interactions  between 
subproblems  in  automated  planning  and  design  models.  In  Stefik' s  work,  subproblems  are 
created  when  simplifying  large  problems  by  decomposing  them  into  smaller  subproblems. 
These  subproblems  normally  cannot  be  solved  independently  since  they  interact;  therefore, 
the  hierarchical  model  of  the  Hayes-Roths  is  hindered  without  some  means  of  effectively 
managing  these  interactions. 

Stefik  proposed  using  constraints,  a  requirement  that  must  be  satisfied,  to  handle 
subproblem  interactions.  Using  a  least  commitment  control  strategy,  subproblems  that  are 
most  constrained  and  specified  are  refined  into  more  detailed  subproblems  until  they  are 
fully  specified.  Decisions  about  poorly  specified,  under-constrained  subproblems  are 
deferred  until  more  information  becomes  available,  usually  from  other  interacting 
subproblems.  Other  design  researchers  [Mittal86,  Clinton88,  Brown88]  demonstrate  these 
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control  issues  in  the  mechanical  design  area.  Mittal  et  al.  successfully  applied  Stefik's 
control  strategy  in  an  expert  system  that  designs  paper  handling  assemblies  for  machine 
copiers.  Both  Clinton  [Clinton88]  and  Brown  [Brown88]  applied  similar  control  methods 
to  the  preliminary  design  of  two-dimensional  mechanical  structures. 

Tong  [Tong  87]  describes  a  control  strategy  called  "opportunistic  commitment",6 
which  attempts  to  provide  some  of  the  opportunistic  behavior  observed  in  protocol  studies 
[Ullman87,  Baker87,  Ullman88].  Clinton  [Clinton88]  implemented  this  type  of  control 
structure  in  the  MOSAIC  computational  design  model  [Nevill89a,  Nevill89b]. 

Most  of  the  knowledge-based  approaches  use  constraints  to  manage  interactions 
between  subproblems,  goals,  and  design  variables,  and  since  each  is  a  knowledge-based 
approach,  the  knowledge  available  to  the  system  required  a  clearly  defined  set  of  goals, 
constraints,  variables,  and  any  relationships  between  these  items.  Abstractions  and  the 
delineation  between  abstractions  must  also  be  clearly  defined  in  order  to  implement  a 
choice  of  hierarchical  plans.  When  new  or  undefined  conditions  arise  in  these  systems, 
some  constraints  remain  unsatisfied  and  the  expert  system  fails;  however,  even  though  the 
knowledge  used  by  these  expert  systems  was  brittle,  the  role  of  constraints  in  design 
models  was  shown  to  be  an  important  means  of  representing  some  forms  of  design 
knowledge. 

Even  though  all  the  computational  models  this  chapter  previously  described  are 
successful  to  varying  degrees  in  modeling  design  processes,  they  have  not  completely 
captured  the  essence  of  design.  They  can  model  specific  design  areas  with  brittle 


[Tong87,  page  146]. 
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knowledge  and  limited  interactions,  but  as  they  explore  their  design  domains,  it  is 
extremely  difficult  to  expand  their  knowledge  bases  due  to  the  widely  varying  types  of 
knowledge  that  design  domains  require.  Interactions  become  more  prevalent  as  a  design 
domain  expands,  and  as  the  number  of  interactions  increases,  the  complexity  of  managing 
these  interactions  exponentially  increases.  Knowledge-based  approaches  to  computational 
design  process  models  are  currently  stagnant  due  to  these  problems;  however,  it  is  the 
ground-breaking  work  of  these  earlier  investigations  that  have  stimulated  this  research. 
The  next  section  of  this  chapter  examines  some  connectionist  design  process  models. 

Artificial  Neural  Networks  in  Engineering 
Artificial  neural  network  models  applied  to  engineering  problem  domains  have 
begun  to  appear  in  the  literature.  Although  very  few  have  dealt  directly  with  some  design 
domain  issues  previously  illustrated  in  this  chapter,  this  section  reviews  several  interesting 
applications.  This  section  does  not  go  into  detail  on  the  theory  underlying  artificial  neural 
networks. 

Stojadinvic  [Stojadinvic90]  investigated  connectionism  as  a  computing  paradigm 
and  its  applicability  to  engineering  design.  His  study  considers  two  aspects  of  neural 
computing.  First,  he  discusses  the  computational  aspects  of  the  paradigm.  His 
investigation  develops  a  foundation  for  the  analysis  of  neural  models  by  defining  models  of 
computational  neurons  and  their  assemblies  into  networks.  Specifically,  he  looks  at  five 
commonly  used  neural  network  models,  competitive,  self-organizing,  associative, 
stochastic,  and  backpropagation  models.  The  second  part  of  his  study  is  an  analysis  of 
applying  the  neural  computing  paradigm  from  an  engineering  design  perspective.  He 
divided  the  engineering  design  domain  into  five  limited,  general  categories:  classifiers, 
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optimizers,  controllers,  AI  tools,  and  models  of  physical  processes.  He  then  investigates 
two  of  these  areas,  classifiers  and  optimizers,  by  implementing  and  experimenting  with 
several  neural  models  to  determine  the  feasibility  of  using  artificial  neural  networks  in  a 
design  domain.  Although  his  work  explores  only  a  limited  dimension  of  design  theory,  he 
does  a  good  presentation  of  neural  network  basics. 

Stojadinvic  defines  a  classifier  as  a  process  of  mapping  instances  from  one  domain 
to  instances  of  another,  usually  a  more  structured  domain.  Although  he  does  not  make  the 
connection,  a  general  design  process  is  a  series  of  mappings  from  abstract  to  more  detailed 
model  spaces.  Stojadinvic  seems  more  concerned  with  memorized  input/output  pairs  of 
neural  models  such  as  classifiers  and  associative  memories;  however,  he  does  make  the 
link  between  using  backpropagation  neural  models  in  this  domain  and  their  strong 
generalization  properties.  He  demonstrates  the  classification  problem  using  a  standards 
processing  example7  for  structural  design.  Using  the  "Interactive  Activation  and 
Competition"  neural  model  [Rumelhart86b,  McClelland88],  he  implements  two 
classification  example  networks. 

The  first  network  is  a  decision  table  evaluator  that  given  a  number  of  conditions 
finds  an  associated  action  or  set  of  actions  that  should  take  place.  His  network 
performance  is  comparable  to  the  performance  of  conventional  decision  table  evaluators. 
The  neural  model  has  additional  features  such  as  handling  ambiguous  situations  by 
suggesting  rules  that  were  not  exact  but  sufficiently  similar  to  a  given  input,  and  the 


Standards  are  normative  rules  used  to  ensure  that  designs  meet  minimum  performance 
criteria.  Standards  processing  requires  identifying  those  standards  that  impose  limits  or 
are  relevant  to  the  design  problem  at  hand. 
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network  can  do  the  inverse  problem  where  given  a  desired  action  or  set  of  actions,  the 
network  determines  the  required  conditions. 

The  second  example  network  is  a  neural  organization  system,  which  is  a  system  of 
classifiers  for  a  collection  of  provisions  that  make  up  a  standard.  The  organizational 
systems  provides  a  mapping  between  the  requirements  of  a  standard  and  the  behavior 
limits.  It  enables  access  to  those  provisions  that  are  important  to  the  design  problem  at 
hand  while  ignoring  those  standards  that  do  not  apply.  This  model  also  performs  well  in 
comparison  to  conventional  organization  systems.  Although  both  networks  accomplished 
the  assigned  tasks,  Stojadinvic  does  note  that  determination  of  neural  model  parameters 
may  be  problem  dependent  and  critical  for  good  network  performance. 

There  are  several  other  works  [Mooney89,  Weiss89,  Fisher89]  that  compare 
general  symbolic  classification  algorithms  like  ID3  and  backpropagation.  Backpropagation 
compares  well  in  terms  of  the  quality  of  classification;  however,  learning  in  artificial  neural 
networks  generally  takes  more  time.  Classification  applications  are  not  new  to  neural 
networks.  Pao  [Pao89]  provides  a  good  overview  of  neural  networks  in  pattern 
recognition. 

Stojadinvic  next  investigated  optimization  applications  for  neural  networks  in 
design.  Stojadinvic  does  not  consider  general  numerical  optimization  as  a  field  to  apply  a 
neural  computational  model.  Instead,  he  limits  his  application  to  combinatorial 
optimization,  where  the  problem  is  to  find  an  optimal  solution  among  a  finite  number  of 
discrete  alternative  solutions.  The  best  known  problem  of  this  type  is  the  traveling 
salesman  problem,  not  necessarily  an  engineering  design  problem  unless  the  salesman  is  an 
engineer.  He  created  two  different  neural  models,  using  a  Hopfield  network  [Hopfield82, 
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Wasserman89]  and  a  Boltzmann  machine  [Ackley85,  McClelland88],  for  solving  the 
traveling  salesman  problem.  In  general,  he  found  that  both  types  of  networks 
accomplished  the  task  but  did  so  quite  differently  and  both  had  limitations.  The  Hopfield 
network  implements  an  associative  memory  model  that  searches  the  solution  space  for  a 
set  of  values  that  minimizes  the  energy  of  the  system.  It  does  this  by  a  process  that  is 
equivalent  to  simple  gradient  decent,  which  is  highly  dependent  on  the  starting  point. 
Simple  gradient  decent  will  find  a  local  minimum  as  a  stable  state  of  the  model,  which  is 
within  the  basin  of  the  starting  point.  Thus,  the  results  of  the  Hopfield  model  depend 
entirely  on  the  shape  of  the  energy  landscape  and  an  essentially  random  starting  point.  The 
Boltzmann  machine  is  a  stochastic  neural  model  and  in  Stojadinvic's  case  performed  better 
than  the  Hopfield  model.  The  Boltzmann  machine  is  an  optimization  process  that  is 
analogous  to  hill-climbing  optimization  with  simulated  annealing  and  is  able  to  avoid 
shallow  local  minimums,  but  it  is  restricted  by  long  execution  times  and  can  get  stuck  in 
deep  local  minimums.  Given  this  experience,  Stojadinvic  feels  that  with  some 
modifications  to  the  energy  landscape  and  dedicated  neural  hardware,  neural  computing 
systems  will  quickly  and  efficiently  handle  combinatorial  optimization  problems.  As  for  its 
applicability  in  an  engineering  design  domain,  he  speculates  that  some  problems  in 
construction  management  and  resource  allocation  might  be  solved  this  way  but  does  not 
give  details.  The  three  remaining  categories  that  Stojadinvic  considers:  controllers,  AI 
tools,  and  models  of  physical  processes,  were  not  investigated  using  any  neural  computing 
model. 

Although  Reich  [Reich91]  did  not  consider  connectionism  in  his  work,  he  extends 
a  machine  learning  approach  to  acquiring  design  synthesis  knowledge  in  order  to  create  a 
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more  robust  and  less  brittle  expert  system  for  designing  cable  stayed  bridges.  Reich  uses  a 
concept  formulation  system  called  COBWEB  for  the  creation  of  hierarchical  classification 
trees.  In  adapting  COBWEB  to  design,  Reich  characterizes  each  design  by  a  list  of 
property-value  pairs.  When  presented  with  these  lists,  COBWEB  would  classify  them  in  a 
hierarchy  where  the  leaf  nodes  of  the  tree  represented  each  design  and  the  branches  to  the 
leaves  would  classify  each  design  based  on  similar  property-values  pairs.  COBWEB  uses 
an  unsupervised  learning  system  that  is  statistically  based  to  develop  the  hierarchy.  When 
designing  a  bridge,  a  user  would  enter  all  specifications  and  COBWEB  would  find  the  best 
match  in  terms  of  known  designs.  Incomplete  specifications  result  in  retrieval  of  all  leaf 
nodes  below  the  point  in  the  hierarchy  where  the  specifications  could  not  be  met. 
COBWEB's  knowledge  base  is  not  static  in  the  sense  that  it  can  be  continually  expanded 
and  extended  as  new  examples  arise.  In  dealing  with  conflicts  and  ambiguous  design 
specifications,  Reich  developed  his  system  without  complete  autonomy  and  allows  human 
interaction.  Without  human  interaction,  the  system  takes  a  conservative  approach  to 
refining  the  design  and  presents  several  design  alternatives  even  though  each  may  not 
satisfy  the  given  specifications.  Reich  combined  COBWEB's  knowledge  of  design 
synthesis  into  a  "complete"  cable  stayed  bridge  design  system  that  includes  evaluation, 
analysis,  and  redesign.  Some  areas  that  Reich  notes  for  further  work  include  extending  the 
system  to  other  design  domains,  tackling  more  complex  designs,  and  supporting  human 
learning  from  the  knowledge  base. 

Ivezic  and  Garrett  [Ivezic92]  developed  an  artificial  neural  network  for  learning 
and  using  design  synthesis  knowledge  called  NETSYN.  NETSYN's  overall  goal  is  the 
same  as  that  of  Reich's  work  [Reich91]  and  is  directly  compared  to  COBWEB.  Like 
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COBWEB,  Ivezic  and  Garrett  assume  that  collections  of  property-value  pairs  are 
sufficient  to  represent  designs,  and  that  the  synthesis  process  is  equivalent  to  assigning 
property  values.  When  all  properties  have  a  value,  then  a  complete  design  description 
results.  Each  property  value  assignment  is  made  in  some  design  context,  and  there  are 
typically  many  possible  design  contexts. 

An  interesting  new  outlook  that  Ivezic  and  Garrett  propose  is  that  it  is  possible  to 
construct  a  sufficiently  accurate  estimation  of  the  probability  of  each  value  of  each  design 
property  being  used  in  a  given  design  context.  By  using  probabilities  to  classify  designs, 
Ivezic  and  Garrett  ensure  NETSYN  represents  the  many-to-many  relationships  typical  of 
synthesis  knowledge  derived  from  example  sets.  Many-to-many  relationships  arise  during 
design  when  some  variables  are  set  while  others  remain  unspecified.  Human  designers, 
through  experience,  can  consider  already  specified  variables  along  with  design 
requirements  when  specifying  other  variables.  Depending  on  how  a  computational  model 
represents  a  design  domain,  it  may  ignore  many-to-many  relationships. 

Given  a  set  of  known  property  values,  NETSYN  predicts  the  a  posteriori 
probabilities  of  each  possible  unknown  property  value  using  a  backpropagation 
connectionist  model.  NETSYN  is  trained  using  a  set  of  design  contexts  which  is  made  up 
of  a  number  of  bound  design  properties  and  a  corresponding  set  of  desired  design  property 
value  probabilities  that  are  to  be  determined.  The  network  architecture  that  NETSYN  uses 
is  one  in  which  a  small  network  exists  for  each  design  property  value  considered.  Each 
neural  network  structure  acts  as  a  probability  estimation  function  for  a  design  variable. 

Ivezic  and  Garrett  evaluated  NETSYN's  performance  relative  to  its  own 
capabilities  and  in  comparison  to  two  symbolic  inductive  learning  approaches,  one  of 
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which  is  Reich's  COBWEB  system  [Reich91].  In  order  to  evaluate  each  of  the  systems, 
they  created  an  artificial  design  problem  that  was  defined  by  eight  design  properties,  each 
of  which  could  take  on  five  different  values.  They  treated  the  first  four  properties  as 
design  specifications  and  the  other  four  as  design  descriptions  that  the  synthesis  process 
would  determine. 

When  evaluating  NETSYN's  capabilities,  Ivezic  and  Garrett  developed  a  total  of 
6250  valid  design  cases,  some  of  which  are  repeated  design  cases  in  order  to  capture  the 
probabilistic  nature  of  the  synthesis  process.  They  partitioned  these  design  cases  into  five 
train-test  collections:  6014  training  cases  and  50  test  cases,  500  training  and  500  test 
cases,  1000  training  and  1000  test  cases,  4000  training  cases  and  2000  test  cases,  and 
5000  training  cases  and  1000  test  cases.  In  testing  NETSYN's  performance  on  the 
artificial  design  problem,  they  looked  at  two  different  scenarios:  when  the  entire  synthesis 
space  is  available  for  training  and  testing  (the  first  train-test  collection)  and  when  part  of 
the  synthesis  space  is  available  (the  last  four  train-test  collections.)  In  comparing 
NETSYN  and  symbolic  learning  systems,  COBWEB  was  the  only  symbolic  system  tested 
that  could  capture  the  design  knowledge  in  a  form  that  Ivezic  and  Garrett  desired.  For 
these  tests,  Ivezic  and  Garrett  used  only  the  last  four  train-test  collections. 

NETSYN's  performance  on  the  given  design  problem  showed  respectable  results 
of  75%  to  80%  perfect  when  the  training  tests  cover  a  wide  range  of  the  synthesis  space. 
Testing  on  a  smaller  number  of  training  samples  showed  that  NETSYN  can  maintain  this 
performance.  The  worst  case  when  given  a  limited  training  test  occurred  when  only  500 
training  and  500  test  cases  were  presented  to  the  network.  In  this  instance,  NETSYN 
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showed  70%  perfect  results.  When  compared  to  COBWEB,  NETSYN  showed 
consistently  better  performance  with  errors  ranging  from  6%  to  27%. 

Based  on  these  limited  results,  Ivezic  and  Garrett  conclude  that  connectionism 
appears  to  be  a  promising  approach  to  acquiring  and  using  design  synthesis  knowledge, 
but  he  acknowledges  that  his  work  was  limited  and  further  extensive  research  should 
proceed.  Some  of  the  limitations  of  NETSYN  are  as  follows: 


• 


• 


Better  representational  capabilities  —  NETSYN  requires  the  input  and  output 
vectors  to  be  mapped  to  binary  processing  elements.  Using  continuous-valued 
properties  would  make  the  network  easier  to  use  and  to  interpret  results. 

Investigate  larger  problems  —  connectionist  approaches  to  most  problem  domains 
is  initially  in  small  tractable  areas. 

Apply  NETSYN  to  realistic  synthesis  tasks. 

Investigate  approaches  to  incremental  learning  —  most  connectionist  learning 
architectures,  including  backpropagation,  work  in  such  a  way  that  once  an  training 
set  has  been  learned,  additional  training  sets  cannot  be  used  to  augment  what  has 
already  been  learned.  This  limits  a  network  to  knowledge  at  hand  when  training 
occurs.  In  the  neural  network  literature,  this  is  referred  to  as  the  stability-plasticity 
dilemma. 

Ivezic  and  Garrett  developed  an  autonomous  design  model  and  did  not  explore 
other  parts  of  a  general  model  of  design  such  as  constraint  management,  evaluation 
of  partial  or  preliminary  designs,  and  mapping  between  abstractions. 

•  NETSYN  performs  well  using  limited  knowledge,  but  Ivezic  and  Garrett  did  not 
investigate  working  with  conflicting  requirements. 

•  Finally,  all  training  sets  given  to  NETSYN  were  randomly  generated  from  the 
6250  possible  solutions,  and  these  solutions  included  duplicates.  Most  supervised 
learning  algorithms  are  somewhat  sensitive  to  the  choice  of  training  sets,  an  issue 
that  Ivezic  and  Garrett  ignore. 

Overall,  NETSYN  shows  that  computational  models  of  design  can  be  built  from 
connectionist  systems.  They  perform  comparatively  with  symbolic  systems  both  in  learning 
and  in  using  design  knowledge.  Where  they  particularly  seem  to  excel  is  in  self-organizing 
and  learning  from  example  sets. 
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Kamarthi  and  Kumara  [Kamarthi93]  investigated  a  similar  use  for  connectionism  in 
the  design  synthesis  process.  They  consider  the  classification  and  mapping  problems  in 
conceptual  design  as  two  separate  processes.  The  classification  problem  they  examine  is  a 
twofold  operation  that  both  learns  classes  of  design  solutions  from  an  example  set  and 
recalls  one  or  more  of  those  design  solutions.  In  connectionist  terms,  this  is  an  associative 
memory  application.  The  mapping  issue  in  conceptual  design  is  more  generative  and 
challenging  for  any  computational  preliminary  design  model.  The  network  must  learn 
plausible  mappings  between  functional  requirements  and  corresponding  design  solutions 
using  previously  solved  design  problems.  In  implementing  computational  models  for 
conceptual  design,  they  created  four  different  network  architectures  for  comparison  of 
these  two  tasks  using  three  different  network  paradigms.  They  represented  all  the  designs 
as  binary  vectors  of  either  eight  or  eleven  elements.  Each  element  of  the  input  vectors 
represents  a  single  requirement  as  either  on  or  off. 

For  the  classification  problem,  Kamarthi  and  Kumara  used  a  backpropagation 
network  and  an  ART-1  network  and  then  compared  the  two.  The  backpropagation 
network  was  trained  using  identical  input  and  output  vectors  such  that  the  network  would 
auto-associate  the  design  solely  based  on  design  requirements.  The  backpropagation 
network  Kamarthi  and  Kumara  adopted  grappled  with  long  training  times  and  suffered 
from  the  stability-plasticity  dilemma  that  is  characteristic  of  backpropagation  networks. 
One  of  the  design  constraints  of  the  ART-1  network  architecture  is  to  overcome  this 
dilemma.  The  ART- 1  network  was  presented  with  the  same  binary  vectors  and  grouped 
similar  designs  into  families  that  could  be  retrieved.  Comparing  the  performance  of  these 
two  networks,  Kamarthi  and  Kumara  note  the  following: 


53 

•  ART-1  networks  exhibit  dynamic  learning  properties.  They  can  continuously  learn 
new  associations  without  forgetting  old  ones,  provided  the  capacity  of  the  network 
has  not  been  reached. 

•  Backpropagation  networks  are  more  efficient  in  retrieving  previously  learned 
design  problems. 

•  Given  a  design  problem  that  is  different  from  the  learned  set,  backpropagation 
networks  can  generate  a  new  solution  using  features  from  the  previously  learned 
examples.  ART-1  networks  do  not  have  this  ability. 

•  The  backpropagation  architecture  reproduces  only  a  single  candidate  output, 
whereas  ART-1  networks  can  retrieve  one-to-many  relationships  between  design 
specifications  and  design  solutions  since  they  categorize  similar  designs  into 
families. 

Kamarthi  and  Kumara  analyzed  the  mapping  problem  using  backpropagation  and 
another  adaptive  network  called  ARTMAP.  Training  the  backpropagation  network  was 
similar  to  what  they  did  for  the  classification  problem.  The  only  change  was  in  the  output 
vectors,  which  included  additional  binary  elements  to  make  each  design  solution  unique. 
The  disadvantages  and  advantages  of  backpropagation  for  mapping  were  identical  to  those 
for  classification. 

The  ARTMAP  network  is  made-up  of  two  ART-1  networks  along  with  a  Map 
Field  module.  One  ART-1  network  stores  the  design  requirements  and  the  other  ART-1 
network  stores  the  design  solutions.  The  Map  Field  module  learns  the  associations 
between  the  families  created  by  both  ART-1  networks.  When  presented  with  a  new  design 
problem,  ARTMAP  could  recall  both  similar  design  problem  solutions  along  with  plausible 
new  design  solutions.  In  addition,  the  ARTMAP  network  could  perform  the  inverse 
problem,  that  is,  it  can  recall  the  plausible  design  problems  associated  with  a  given  design 
solution. 

Of  all  the  networks  that  Kamarthi  and  Kumara  investigated,  ARTMAP  seems  well 
suited  to  exploring  neural  network  design  synthesis  tasks;  however,  it  suffers  from  the 
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same  benefits  and  limitations  as  the  ART-1  networks.  One  limitation  that  ARTMAP  and 
ART-1  networks  have  that  backpropagation  does  not  have  is  that  they  can  only  work  with 
binary  input  and  output  vectors.  As  far  as  applicability  as  a  computational  model  of 
design,  Kamarthi  and  Kumara's  networks  have  the  same  limitations  as  those  of  NETSYN 
[Ivezic92],  particularly  in  the  area  of  investigating  simplified  design  problems;  however, 
his  work  further  supports  the  notion  connectionist  architectures  can  be  used  successfully 
in  preliminary  design. 

In  another  comparison  with  a  symbolic  design  reasoning  system,  Wilson  and 
Sharda  [Wilson93]  examine  a  neural  network  approach  to  duplicating  the  performance  of 
a  rule-based  expert  system.  The  primary  appeal  that  a  neural  network  offers  according  to 
Wilson  and  Sharda  is  the  inductive  approach  to  knowledge  acquisition  that  does  not 
require  strict  specification  of  IF-THEN  rules  or  other  knowledge  representation  schemes. 
The  goal  of  his  study  is  to  assess  the  performance  of  a  backpropagation  type  network  in 
mimicking  a  rule-based  approach  to  design  decision  making.  An  already  existing  expert 
system  for  packer  selection  in  oil  well  design  was  used  to  create  random  training  and  test 
cases  for  the  network  where  one  specific  packer  is  recommended  for  each  case.  A  total  of 
240  cases  were  created  and  divided  into  different  training  sets  in  order  to  investigate  the 
effects  of  different  training  set  sizes.  There  were  a  total  of  eight  input  neurons  representing 
design  specifications,  ten  hidden  neurons,  and  six  output  neurons,  each  representing  one  of 
the  possible  packer  designs. 

Three  different  networks  were  trained  with  different  training  set  sizes  of  1 20,  1 80, 
and  240  cases.  According  to  Wilson  and  Sharda,  each  network  performed  very  well  and 
was  able  to  correctly  classify  more  than  93%  of  the  cases,  and  in  the  240  case  training  set 
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size,  that  network  could  classify  correctly  95%  of  the  cases.  In  comparison  to  the 
performance  of  the  expert  system,  Wilson  and  Sharda  were  able  to  conclude  that  none  of 
the  networks  had  learned  the  exact  upper  bound  on  one  of  the  design  variables.  Wilson 
and  Sharda' s  system  did  not  require  binary  input  or  output  vectors,  and  they  surmise  that 
where  the  network  fails  are  in  those  instances  where  the  network  must  learn  characteristics 
of  continuous  variables  or  where  hard  constraints  were  required.  Wilson  and  Sharda  feel 
that  using  exemplar  training  sets  that  appropriately  represent  these  hard  constraints  and 
that  random  generation  of  cases  did  not  accurately  describe  the  actual  case  distribution 
characteristics.  In  summary,  this  work  shows  that  artificial  neural  networks  offer  distinct 
advantages  in  knowledge  acquisition  for  tasks  that  can  be  solved  using  rule-based  expert 
systems  and  might  be  especially  useful  in  domains  where  a  domain  expert's  knowledge  is 
unavailable. 

Berke  et  al.  [Berke93]  examine  a  more  routine  design  task  of  several  aerospace 
structural  components  using  artificial  neural  networks.  Their  focus  is  the  application  of 
neural  networks  to  capture  structural  design  expertise  through  their  ability  to  learn  from 
examples.  Berke  et  al.  recognize  a  major  advantage  of  some  connectionist  approaches 
over  traditional  computational  methods  such  as  numerical  optimization  in  that  a  trained 
network  can  produce  results  with  trivial  computational  effort.  Another  advantage  is  that 
the  predictive  capabilities  of  connectionist  models  are  insensitive  to  numerical  instabilities 
and  convergence  difficulties  typically  associated  with  optimization.  One  disadvantage  that 
Berke  et  al.  see  in  a  neural  network  approach  is  that  the  requisite  number  of  design 
examples  for  training  is  high.  Berke  et  al.  do  not  explicitly  consider  training  times  and 
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training  convergence  problems,  which  are  similar  to  convergence  problems  in  numerical 
optimization. 

Their  routine  design  tasks  were  to  generate  designs  for  a  trussed  ring  and  two 
types  of  wing  sections.  Optimum  design  data  were  used  to  create  125  sets  of  optimum 
minimum  weight  ring  designs  subject  to  stress,  frequency,  and  displacement  constraints. 
The  first  wing  design  problem  used  1 5  optimum  minimum  weight  wing  designs  under 
displacement  constraints,  and  the  second  wing  design  problem  used  50  optimum  minimum 
weight  forward  swept  wing  designs  under  displacement  constraints.  For  the  ring  design, 
the  input  parameters  were  the  inner  and  outer  radii  and  the  frequency  limit  for  a  total  of 
three  input  units,  and  the  minimum  weight  and  cross-sectional  areas  of  25  truss  bars  are 
the  output  units.  All  input  and  output  units  were  continuous,  real  values.  The  125  designs 
varied  in  weight,  depending  on  their  dimension  and  frequency  requirements,  from  1,000 
pounds  to  150,000  pounds.  The  designs  were  divided  into  a  training  set  that  consisted  of 
120  designs  and  a  testing  set  of  five  designs.  Predictions  displayed  individual  error  rates 
between  0%  and  10%  for  close  to  80%  of  the  variables;  however,  several  design  variables 
showed  much  higher  error  rates.  Berke  et  al.  surmise  that  considering  the  complexity  of 
the  optimum  designs  with  shifting  load  paths  and  the  huge  weight  range,  the  training  and 
predictions  are  satisfactory  relative  to  an  expert  human  designer.  The  results  for  the  two 
wing  designs  further  support  his  conclusions  that  artificial  neural  networks  can  predict 
optimum  designs  under  difficult  design  requirements,  but  several  design  parameters  had 
higher  errors.  Similar  to  Stojadinvic's  findings  [Stojadinvic90],  Berke  et  al.  feel  that 
reducing  the  complexity  of  the  design  space  will  reduce  error  magnitudes  and  increasing 
the  number  of  training  sets  will  also  help  reduce  any  errors.  When  the  design  space  has 


57 

discontinuities,  artificial  neural  networks  have  a  difficult  time  modeling  them.  This  finding 
corresponds  to  Wilson  and  Sharda's  packer  design  system's  inability  to  properly  learn 
hard  numerical  limits  [Wilson93].  Berke  et  al.  speculate  that  clustering  the  training  data 
and  decomposing  the  network  within  these  clusters  may  be  a  viable  solution  to  dealing 
with  complex  design  spaces.  Berke  et  al.  urge  more  research  to  assess  the  viability  and 
usefulness  of  neural  networks  as  expert  designers. 

Connectionism  as  a  Computational  Model 
There  are  many  different  views  of  what  design  is  or  what  it  should  be.  Design 
research  has  stimulated  these  discussions  and  expanded  our  thinking  into  areas  of  human 
cognition  and  computing  that  several  years  ago  designers  never  considered.  One  of  the 
primary  benefits  from  any  type  of  research  into  design  is  in  the  alternative  dimensions  or 
increased  depth  that  a  viewpoint  proposes  or  illustrates.  Any  discussion  or  critique  should 
not  rank  or  declare  one  model  as  better  than  another.  Every  model  has  strengths  and 
weaknesses,  and  examinations  of  any  paradigm  contribute  to  the  general  discourse  of 
design.  My  previous  research  into  knowledge-based  design  models  provided  the 
stimulation  to  investigate  an  alternative  paradigm,  and  artificial  neural  networks  were 
chosen  for  several  compelling  reasons  that  are  detailed  in  this  section. 

Knowledge-based  systems  focus  on  capturing  the  essence  of  human  reasoning  by 
representing  knowledge  as  a  collection  of  symbols  and  manipulating  logical  statements. 
The  knowledge  may  be  in  the  form  of  rules,  frames,  scripts  or  other  symbolic 
representations.  Connectionism,  on  the  other  hand,  does  not  explicitly  represent 
knowledge  but  concentrates  on  modeling  reasoning  processes  at  a  lower  level  than 
symbolic  reasoning  in  order  to  go  beyond  the  restrictions  imposed  by  sharply  defined 
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categories  and  formal  logic.  As  previously  discussed,  humans  use  many  reasoning  methods 
during  design.  When  developing  a  computation  model,  the  majority  of  research  has 
concentrated  on  developing  such  features  of  design  as  design  rules,  design  hierarchies,  and 
articulating  design  processes.  Symbolic  computational  models  do  a  good  job  of  supporting 
these  areas,  but  design  is  more  than  this.  A  large  part  of  design  is  emphasized  in 
experience.  Computationally,  experiential  knowledge  goes  beyond  recalling  a  specific  case 
from  a  database  of  known  designs  but  also  in  the  emergence  of  designs  from  combining 
multiple  experiences  into  a  new  design  situation  that  may  be  innovative  and  superficially 
unrelated  to  previous  design  situations.  Humans  have  difficulty  articulating  this  process 
that  some  call  creative  or  inventive. 

Connectionist  models  that  learn  can  synthesize  new  forms  from  previously  learned 
examples.  This  goes  beyond  classification  problems.  These  generalization  capabilities 
make  connectionist  models  both  robust  and  innovative  providing  the  training  examples 
cover  many  dimensions  of  a  solution  domain.  Because  artificial  neural  networks  exhibit 
only  inductive  learning  capabilities,  they  are  not  truly  spontaneous  in  their  solutions,  but 
their  solutions  may  have  a  variety  of  properties,  each  derived  from  learned  examples.  Both 
prescriptive  and  descriptive  design  process  models  suffer  from  a  knowledge  bottleneck. 
Good  human  designers  have  decades  of  formal  training  and  experience  as  a  basis  for 
creating  new  designs.  Humans  continuously  assimilate  their  life  experiences  into  memories 
that  influence  their  creative  processes,  and  only  closed  minded  people  have  a  knowledge 
bottleneck.  Computational  design  process  models  that  learn  may  help  alleviate  this 
constriction  by  learning  like  human  designers  do,  from  previous  design  experiences. 
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Computational  design  process  models  must  learn  two  types  of  design  knowledge 
classes,  domain  knowledge  and  design  process  knowledge.  Design  researchers  have 
different  means  of  communicating  these  types  of  knowledge  that  might  depend  on  the  type 
design  model  (prescriptive  or  descriptive),  the  design  domain,  and  the  design  task  (an 
autonomous  system  or  only  one  stage  in  a  larger  process  model).  Domain  knowledge  may 
consist  of  abstractions,  requirements,  design  variables,  and  even  design  procedures. 
Design  process  knowledge  is  the  application  of  domain  knowledge.  Both  these  types  of 
knowledge  require  representations  within  a  computational  model. 

Connection!  sm  emphasizes  the  structure  of  the  human  cognitive  process  as  an 
artificial  neural  network.  It  is  artificial  in  the  sense  that  the  parallel  processes  that  occur  in 
a  biological  neural  network  are  computationally  simulated  on  serial  computers  and  the 
biological  processes  are  simplified.  The  primary  difference  between  knowledge-based 
models  and  connectionist  models  is  that  knowledge  is  not  explicitly  represented  in 
symbolic  terms  but  in  the  strength  of  the  connections  between  "neurons."  Neural 
computational  models  rely  on  the  notion  of  memory,  and  that  in  response  to  some  outside 
stimulus,  the  model  will  recall  various  "memories." 

In  the  studying  of  design,  knowledge-based  approaches  make  the  knowledge  that 
they  use  explicit.  That  is,  the  design  knowledge  itself  is  available  for  study  and  available 
for  creating  explanations.  In  teaching  design  expertise,  this  is  a  very  desirable  attribute, 
and  in  complex  design  environments,  human  designers  may  need  to  check  the  design 
processes  or  product  for  consistency.  Unfortunately,  the  knowledge  base  itself  is  very 
fragile  and  brittle,  which  limits  this  utility. 
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When  learning  from  examples,  artificial  neural  networks  self-organize  the 
knowledge  in  the  magnitude  of  the  weights  between  processing  units.  Since  the 
representation  of  specific  knowledge  concepts  is  not  explicit  within  the  network, 
connectionist  models  can  represent  a  great  deal  of  information  that  might  be  implied 
within  the  training  examples  such  as  abstract  design  requirements,  mappings  between 
abstraction  levels,  and  interacting  subproblems  and  requirements.  Network  models  do  not 
require  the  control  structure  that  knowledge-based  systems  employ  to  manage  their 
constraints  and  knowledge,  but  their  training  examples  have  a  huge  effect  on  their 
performance.  As  an  example,  if  we  train  a  network  to  recognize  the  first  twenty-five 
characters  in  the  alphabet,  the  network  will  never  be  able  to  recall  the  letter  'z'  since  we 
have  never  given  it  that  association.  What  the  network  will  attempt  to  do  is  characterize 
the  letter  'z'  as  one  of  the  twenty-five  letters  it  does  know  about.  Thus  in  terms  of  design, 
when  we  employ  learning  in  a  network  implementation,  the  network  will  learn  the  implied 
details  about  domain  knowledge  and  about  processes  such  that  it  can  differentiate  between 
each  learned  example.  If  we  do  not  choose  examples  in  sufficient  quantity  or  quality  to 
represent  those  dimensions  of  a  design  domain  that  are  significant,  the  network  will  never 
learn  them. 

When  given  novel  situations,  connectionist  systems  are  able  to  generalize  by 
recalling  the  most  mutually  compatible  and  consistent  memory  from  the  given  stimulus. 
Numerically,  this  is  a  kind  of  relaxation  procedure  that  is  typically  part  of  the  algorithm 
that  simulates  recall.  This  feature  of  connectionism  appears  to  capture  the  essence  of 
design,  synthesis  and  innovation. 
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Many  design  domains  are  characterized  by  abstract,  conflicting  requirements  that 
provide  a  starting  point  for  a  design  process.  Because  artificial  neural  networks  have  the 
ability  to  generalize,  they  are  robust  in  the  sense  that  given  conflicting  or  incomplete 
stimulus,  they  will  gracefully  degrade  by  recalling  a  compatible  and  consistent  solution. 
This  is  an  inherent  feature  of  artificial  neural  networks  and  not  part  of  an  additional 
control  structure  that  knowledge-based  systems  would  require.  Recall  that  protocol 
studies  identified  opportunistic  behavior  in  many  designers  that  tended  to  cross  abstraction 
boundaries.  A  network's  generalization  capabilities  emulate  this  same  process. 

Connectionism  is  not  a  complete  paradigm  for  design  computational  models.  Other 
approaches  to  computational  models  may  provide  a  better  structure  for  studying  such 
important  aspects  of  design  as  design  rules,  abstractions  of  design  features,  explanation  of 
reasoning  processes,  and  explicit  evaluation  of  design  artifacts  [Coyne90].  Where 
connectionism  appeals  is  in  the  ability  to  use  experiential  knowledge,  apparent  ability  to 
synthesize  in  novel  situations,  and  the  implicit  ability  to  self-organize  knowledge.  These 
features  of  connectionist  systems  are  the  stimulus  for  this  research. 


ARTIFICIAL  NEURAL  NETWORKS 

This  chapter  provides  a  foundation  for  artificial  neural  networks  by  presenting  a 
brief  historical  perspective  and  the  general  theory  behind  artificial  neural  networks.  In  later 
chapters  details  of  two  classes  of  networks,  stochastic  and  feedforward  networks,  are 
presented,  but  before  any  analysis  of  neural  computing  is  done,  a  solid  base  in  that  field 
must  be  established.  The  following  sections  define  what  an  artificial  neural  network  is  and 
why  they  appear  appealing  to  practitioners  in  many  different  fields  of  research.  Along  with 
other  artificial  intelligence  paradigms,  artificial  neural  networks  have  experienced 
explosive  growth  and  interest,  even  to  the  point  where  some  people  consider  them  as 
another  computing  model,  along  with  numeric  and  symbolic  computing.  Development  of 
artificial  neural  networks  in  their  current  form  began  in  the  1980's;  however,  their  roots 
can  be  traced  back  much  further  as  described  in  the  next  section. 

Historical  Perspective 
Humans  have  always  speculated  on  how  our  brain  generates  and  organizes  our 
thoughts  and  memories.  Both  the  spiritual  and  anatomical  nature  has  been  examined  by 
philosophers,  theologians,  psychologists,  physiologists,  and  anatomists  with  limited 
progress  and  lacking  agreement.  Due  to  the  complex  nature  of  thought  processes  and  the 
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complexity  of  even  the  most  simple  neural  systems,  application  of  scientific  methods 
have  been  slow  in  understanding  human  thought. 

Neurobiologists  and  neuroanatomists  have  made  substantial  progress  in 
understanding  how  the  brain  and  nervous  system  are  put  together,  but  have  made  little 
progress  towards  understanding  its  operation.  The  complexity  of  the  brain  is 
staggering  with  hundreds  of  billions  of  neurons,  each  connected  to  thousands  of  other 
neurons.  Systems  of  this  size  dwarf  even  the  most  ambitious  super  computers  known 
to  date. 

Gradually,  as  researchers  developed  a  rudimentary  understanding  of  the 
functioning  of  the  neuron  and  its  pattern  of  interconnections,  mathematical  models 
have  emerged  to  test  these  theories.  From  these  early  works,  it  became  apparent  that 
even  simple  models  of  neurons  and  their  interconnections  not  only  functioned  in  a 
similar  manner  as  the  brain,  but  they  displayed  many  practical  functions  beyond  just 
mimicking  the  brain.  Thus,  even  from  the  early  days  of  neural  network  research,  two 
mutually  reinforcing  objectives  have  emerged.  First,  some  researchers  focus  on 
understanding  the  physiological  and  psychological  functions  of  the  brain,  and  second, 
some  researchers  develop  artificial  neural  systems  that  perform  brain-like  functions. 
This  research  is  directed  to  the  latter.  It  is  interesting  to  note  that  some  of  these  same 
dilemmas  concerning  understanding  human  thought  processes  and  memories  that 
confront  researchers  in  neural  systems  also  face  design  researchers. 

Artificial  neural  networks  made  their  first  appearance  in  the  1940's. 
Neurophysiologists  wanting  to  duplicate  the  functions  of  the  brain  developed  simple 
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hardware  models  of  neurons  and  their  interconnections.  Even  though  these  hardware 
models  were  simple,  they  achieved  impressive  results.  McCulloch  and  Pitts 
[McCulloch43]  published  the  first  systematic  study  of  the  mathematical  foundations 
for  neural  network  research  that  was  to  follow.  Most  of  their  work  was  in  developing 
the  simple  binary  neuron  model  known  as  the  perceptron.  These  systems  generally 
have  a  single  layer  of  neurons  connected  by  weights  to  a  set  of  inputs  as  shown  in 
Figure  9.  The  sigma  (I)  unit  multiplies  each  input,  x„  by  a  weight,  w„  and  sums  the 
resulting  values. 
The  model  then  ^Wj 


passes  these  values 
to  a  threshold  unit 


Outpul 


that  compares  the 

value  to  a  Figure  9:  Simple  Perceptron 

predetermined  threshold  value.  If  the  sum  is  greater  than  the  threshold,  then  the  output 

is  one,  else  the  output  is  zero. 

Hebb  [Hebb49]  proposed  the  first  explicit  statement  of  a  physiological  learning 

rule  for  synaptic  modification.  Although  Hebb's  work  was  not  a  mathematical 

statement,  Hebbian  learning  is  the  foundation  for  most  network  learning  algorithms 

and  is  based  on  Hebb's  description, 

When  an  axon  of  cell  A  is  near  enough  to  excite  a  cell  B  and  repeatedly 
or  persistently  takes  part  in  firing  it,  some  growth  process  or  metabolic 
change  takes  place  in  one  or  both  cells  such  that  A's  efficiency,  as  one 
of  the  cells  firing  B,  is  increased.  [Hebb49,  page  50] 
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Hebb's  book  encompasses  more  than  just  a  proposed  learning  rule;  he  provides 
a  useful  discussion  of  the  link  between  psychology  and  physiology  and  coined  the  term 
"connectionism."  Several  other  neural  concepts  were  also  proposed  or  recognized  by 
Hebb.  Other  than  his  learning  rule,  Hebb  also  asseverates  the  distributed  nature  of 
representation  that  the  nervous  system  uses.  In  order  to  represent  something,  many 
cells  in  the  nervous  system  must  take  part  in  the  representation.  This  gives  rise  to 
Hebb's  third  concept,  which  postulated  that  cells  are  arranged  in  assemblies.  These 
assemblies  are  interconnected,  self-reinforcing  subsets  of  neurons  that  form  the 
representation  of  information.  Individual  cells  could  belong  to  more  than  one  assembly 
depending  on  the  context,  and  multiple  assemblies  could  be  active  at  any  one  time. 
Thus,  Hebb  proposed  that  there  is  a  distributed  representation  at  both  the  anatomical 
level  and  at  the  functional  level. 

In  the  1950s,  the  world  entered  the  computer  age,  and  artificial  neural  network 
research  benefited  from  this  trend.  Early  neural  network  research  and  simulation  relied 
on  few  mathematical  statements  and  many  wordy  descriptions.  Rochester  et  al. 
[Rochester56]  studied  Hebb's  learning  system  using  a  computer  simulation  of  the 
nervous  system.  This  work  was  one  of  the  first  to  test  a  well  formulated,  detailed 
neural  theory  using  a  computer  simulation.  This  paper  made  an  important  point  about 
neural  network  research  that  is  directly  tied  to  the  computer.  Before  the  computer, 
neural  theories  could  be  proposed,  discussed,  and  analyzed,  but  they  could  never  be 
tested.  Using  a  computer,  researchers  could  test  the  form  and  precision  of  assumptions 
and  know  whether  they  work  or  not  and  to  what  degree.  Small  details  could  no  longer 
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be  ignored,  and  success  would  depend  as  much  on  the  details  as  on  developing  the 

general  theories. 

Rosenblatt  [Rosenblatt58]  proved  a  perceptron  learning  theorem  which 

demonstrated  that  a  perceptron  could  learn  anything  it  could  represent.  His  work 

stimulated  many  researchers  to  further  investigate  the  potential  of  perceptrons. 

Rosenblatt's  excitement  is  clear  from  the  following  quote: 

The  question  may  well  be  raised  at  this  point  of  where  the  perceptron' s 
capabilities  actually  stop  ...  the  system  described  is  sufficient  for 
pattern  recognition,  associative  learning,  and  such  cognitive  sets  as  are 
necessary  for  selective  attention  and  selective  recall.  The  system 
appears  to  be  potentially  capable  of  trial  and  error  learning  and  can 
learn  to  emit  ordered  sequences  of  responses  .  .  .  [Rosenblatt58,  page 
404] 

However,  he  also  recognized  some  of  the  more  serious  computational  limitations  of 
perceptrons  that  still  plague  artificial  neural  networks  today.  He  notes  that  perceptrons 
act  in  a  "brain  damaged"  manner  by  being  literal,  inflexible,  and  unable  to  handle 
abstractions. 

Widrow  and  Hoff  [Widrow60]  extended  the  perceptron  model  by  proposing  a 
perceptron-like  system  that  could  potentially  learn  quickly  and  accurately.  The  neurons 
of  this  system  were  binary  threshold  logic  units  with  interconnections  of  variable 
strength.  The  neurons  computed  a  weighted  sum  of  the  inputs  times  the  synaptic 
weights  and  added  a  bias  term.  If  the  sum  was  greater  than  zero,  then  the  neuron 
output  was  +1,  and  if  the  sum  was  equal  or  less  than  zero,  then  the  output  was  -1 . 

The  learning  rule  that  Widrow  and  Hoff  employed  was  a  simple  supervised 
algorithm  that  assumes  that  an  input  pattern  has  an  output  pattern  made  up  of  a  series 
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of  values  that  take  on  either  +1  or  -1.  When  learning,  their  system  computed  an  error 
signal  for  each  neuron,  error,,  which  is  the  difference  between  what  the  neurons 
computed  and  the  exact  answer.  The  synaptic  weights,  wv,  were  then  adjusted  for 
time  t  +  1  given  the  neuron  input,  x„  and  a  constant,  a,  as: 

wtj  (t  +  \)  =  wIJ(t)  +  a-xr  error j 

This  process  would  continue  until  the  system's  response  was  exactly  correct  (i.  e.,  the 
error  signal  became  exactly  zero). 

Perceptrons  and  perceptron-like  systems  generated  a  great  deal  of  interest  in 
the  early  1960's  due  to  their  initial  successes  at  learning  some  simple  useful  functions 
and  exhibiting  brain-like  behavior;  however,  even  from  the  earliest  days,  many 
scientists  suspected  or  even  showed  that  the  types  of  problems  that  perceptrons  could 
solve  were  limited  in  scope  [Rosenblatt58].  The  book,  Perceptrons  [Minsky69],  simply 
proved  these  computational  limitations  of  perceptrons  of  what  they  could  represent 
and  learn. 

The  analysis  that  Minsky  and  Papert  employed  is  based  on  the  simple 
perceptron  model  shown  in  Figure  10.  The  computation  of  a  function  v|/(x)  in  response 
to  some  stimulus  X  is  performed  in  two  stages.  First,  functions  (p(x)  are  computed  and 
combined  through  a  function  Q  that  outputs  a  single  value  \\>.  Minsky  and  Papert  show 
that  this  model  performs  like  a  logical  predicate  function.  By  imposing  certain 
conditions  and  restrictions  on  perceptrons,  they  were  able  to  prove  several  important 
limitations  of  perceptrons. 
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Figure  10:  Minsky's  Perceptron  Model 
The  first  limitation  that  they  consider  is  that  on  order,  which  essentially 
recognizes  that  only  a  limited  number  of  input  units,  cp,,  can  be  connected  to  each 
association  unit,  fl  The  second  limitation  is  that  of  diameter,  where  input  units,  cpi, 
can  only  connect  a  limited  geometrical  region  to  an  association  unit,  Q.  Because  of 
these  limitations,  Minsky  and  Papert  proved  that  order  and  diameter  limited 
perceptrons  could  not  compute  the  predicate  for  parity  nor  that  of  connectedness.  The 
parity  problem  requires  counting  the  number  of  active  inputs  and  determining  if  the 
total  is  odd  or  even. '  The  connectedness  problem  is  defined  as  a  predicate  for 
determining  if  all  points  in  any  geometric  figure  are  connected  to  one  another. 

In  summary,  Perceptrons  described  a  general  dissatisfaction  with  perceptron 
concept  and  was  not  solely  responsible  for  the  decline  of  neural  network  research  in 
the  United  States.  Because  Minsky  and  Papert  did  such  a  clear  and  thorough  job  of 
illuminating  the  perceptron's  limitations,  some  of  the  presumptions  presented  in  the 


1  The  XOR  problem  is  a  parity  problem  with  two  inputs. 
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last  chapter  of  Perceptrons  did  dampen  and  delay  future  work  in  neural  networks  as 

the  following  quote  exemplifies: 

...  we  consider  it  to  be  an  important  research  problem  to  elucidate  .  .  . 
our  intuitive  judgment  that  the  extension  to  [multiple  layer  systems]  is 
sterile.  [Minsky69,  page  232] 

This  assumption  was  later  proved  wrong  [Rumelhart86c]  using  the  back  propagation 
algorithm,  which  could  solve  the  parity  problem.  It  is  important  to  optimistically 
realize  that  what  the  underlying  result  from  Perceptrons  was  a  chance  to  consolidate 
and  extend  the  field  away  from  the  glare  caused  by  the  initial  hype  created  by  early 
successes.  Where  neural  network  research  did  continue  was  in  the  area  of 
psychological  modeling,  which  carried  artificial  neural  network  research  into  the 
1980's. 

Perhaps  the  most  famous  works  presented  in  the  early  1 970s  is  that  by 
Kohonen  [Kohonen72]  and  Anderson  [Anderson72]  who  independently  proposed  the 
same  model  for  associative  memories.  Kohonen  is  primarily  concerned  with  the 
mathematical  properties  of  such  systems,  whereas  Anderson  focuses  on  the 
physiological  plausibility  of  these  systems.  The  linear  associator  proposed  in  these  two 
papers  is  markedly  different  from  the  perceptron.  They  consider  that  most  neurons  are 
not  binary  restricted  neurons,  with  only  two  possible  outputs,  but  have  continuous 
valued  outputs. 

The  basic  neuron  model  is  a  very  simple  analogue  integrator  with  a  continuous 
valued  output.  It  takes  a  set  of  inputs,  X,  multiplies  them  by  the  synaptic  weights,  W, 
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adds  them  up,  with  the  neuron's  output  proportional  to  the  sum.  The  input-output 
relations  of  such  systems  are  specified  in  terms  of  matrix  multiplication. 

Since  these  models  are  memory  models,  they  require  some  type  of  learning  rule 
and  both  Anderson  and  Kohonen  use  a  generalization  of  Hebbian  learning,  which 
modifies  synaptic  weights  in  proportion  to  the  correlation  between  input  and  output 
elements.  In  mathematical  terms,  the  connection  matrix  storing  the  memories  becomes 
the  outer  product  of  the  input  and  output  vectors,  and  for  recall,  multiplying  the  input 
vector  by  the  connection  matrix  yields  the  output  vector. 

As  more  associations  are  stored  in  the  connection  matrix,  the  resulting 
association  is  generally  not  perfect.  The  only  case  where  association  is  perfect  is  when 
the  input  vectors  are  orthogonal.  This  puts  an  upper  limit  on  the  number  of  vectors 
that  can  be  stored  based  on  the  dimensionally  of  the  memory  (i.  e.,  the  connection 
matrix).  This  obviously  wastes  neurons  since  the  capacity  of  each  neuron  is  not 
necessarily  maximized,  but  the  usefulness  of  the  model  made  the  trade-off  acceptable. 
Although  this  type  of  system  is  simplistic  in  its  linearity,  it  can  model  many  useful 
properties  and  has  served  as  a  starting  point  for  larger,  more  complicated  systems. 

Stephen  Grossberg  has  been  one  of  the  leaders  in  neural  network  research  over 
the  past  twenty-five  years.  His  work  is  founded  on  his  complex  and  very  detailed 
mathematical  analysis  of  brain  function  and  has  found  utility  in  many  areas,  even  in 
engineering  design  [Kamarthi93].  He  is  particularly  well  known  for  his  series  of 
computer  simulation  programs  implementing  variations  of  his  Adaptive  Resonance 
Theory  or  ART.  This  body  of  work  does  not  directly  use  adaptive  resonance  theory; 
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however,  Grossberg's  research  is  consequential  to  current  neural  network  theory.  In 
one  of  his  most  important  papers  [Grossberg80],  he  provides  access  to  much  of  his 
basic  thinking  on  how  the  brain  should  work. 

Most  of  the  Grossberg's  work  is  related  to  how  he  thinks  a  neural  network 
should  handle  error  correction  during  learning.  His  theory  is  different  from  Widrow 
and  Hoff  s  learning  theory  [Widrow60]  and  subsequent  other  supervised  learning 
algorithms,  and  it  serves  as  a  basis  for  his  series  of  ART  programs.  The  key  point 
made  in  by  Grossberg  is  that  a  neural  network  should  generally  do  error  correction  by 
itself  rather  than  from  a  "teacher"  that  indicates  what  is  wrong.  In  order  to  do  this  type 
of  error  correction,  Grossberg  suggests  that  the  neural  system  is  made  up  of  two 
systems  in  series  communicating  with  each  other.  Input  or  stimulus  to  one  neural 
system  causes  some  neurons  to  be  stimulated  on  the  other  system. 

When  one  input  pattern  causes  the  wrong  set  of  cells  to  be  stimulated  in  the 
second  system,  this  corresponds  to  an  error.  By  having  reciprocal  connections  between 
the  second  neural  system  and  the  first,  a  pattern  of  activity  that  stimulates  the  second 
systems  will  result  in  "learned  feedback"  returning  to  the  first  neural  system.  Thus,  the 
associated  pattern  from  the  second  system  interacts  with  the  actual  input  pattern,  and 
the  neural  network  requires  no  outside  error  correction  feedback. 

Another  way  in  which  Grossberg  deviates  from  the  norm  concerning  error 
correction  is  that  instead  of  using  the  difference  between  the  input  pattern  and  stored 
pattern,  he  proposes  using  the  sum.  In  using  the  sum,  Grossberg  must  deal  with 
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effectively  suppressing  small  signals  and  enhancing  large  ones  through  what  he  calls  a 
"quenching  threshold." 

Grossberg  tests  for  matching  between  the  input  pattern  and  the  learned 
feedback.  If  the  learned  feedback  matches  the  input  pattern,  then  the  sum  is  larger  than 
the  input  pattern  and  thus  there  has  been  an  enhancement  since  the  feedback  matches 
the  input  pattern.  On  the  other  hand,  if  the  feedback  and  input  do  not  match,  then  the 
sum  will  be  more  uniform  without  the  enhanced  peaks  of  a  correct  signal.  If  the 
quenching  threshold  is  properly  set,  some  neural  activity  must  become  suppressed  in 
order  to  tune  the  response. 

Grossberg  notes  that  the  feedback  portion  of  a  set  of  neurons  can  in  essence 
provide  a  means  to  maintain  a  set  of  patterns  in  what  he  terms  "short  term  memory" 
that  remains  active  even  if  some  inputs  are  shut  off.  When  an  input  and  feedback 
pattern  match,  strong  signals  result  and  the  patterns  reinforce  one  another.  Grossberg 
refers  to  this  phenomenon  as  "adaptive  resonance."  Such  network  dynamics  are  now 
used  in  many  neural  networks,  even  though  the  feedback  mechanisms  may  be  different. 

The  work  of  Hopfield  [Hopfield82]  brought  together  many  of  the  ideas  of  his 
precursors  in  neural  network  research;  however,  Hopfield  augmented  his  discussion 
with  a  clear  and  detailed  mathematical  analysis  of  a  computational  neural  network. 
Hopfield' s  network  is  recurrent  since  all  neurons  connect  to  each  other  and  to 
themselves.  The  neurons  can  achieve  only  binary  states,  and  he  assumes  that  the  neural 
network  needs  to  learn  a  set  of  states  or  activation  patterns.  The  network  employs 
Hebbian  learning  to  define  the  state  space  for  a  given  set  of  activation  patterns. 
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Hopfield's  premise  was  that  the  function  of  a  neural  system  is  to  develop  a 
number  of  locally  stable  points  or  attractors  in  state  space.  Other  points  in  state  space 
flow  toward  a  stable  point.  This  is  the  most  interesting  aspect  of  Hopfield's  work,  and 
by  making  the  connection  that  neural  network  dynamics  are  analogous  to  statistical 
mechanics,  he  helped  broaden  the  field  of  neural  networks.  No  longer  would  the  field 
solely  consist  of  those  researchers  wishing  to  understand  how  neural  networks  work 
from  a  biological  and  psychological  standpoint,  but  his  worked  prompted  many  other 
scientists  to  ask  what  artificial  neural  networks  could  accomplish. 

In  making  this  analogy,  Hopfield  defined  a  term  that  is  analogous  to  energy 
and  that  characterized  the  current  network  state  as  either  stable  or  unstable.  He 
mathematically  showed  that  the  algorithm  for  modifying  an  unstable  state  vector 
causes  the  energy  term  to  be  monotonically  decreasing;  thus,  state  changes  continue 
until  a  local  minimum  energy  is  achieved.  The  modifying  algorithm  chooses  a  neuron 
unit  at  random,  examines  its  inputs,  and  changes  it's  state  to  either  on  or  off, 
depending  on  the  sum  of  the  inputs  being  above  or  below  a  set  threshold.  Therefore, 
the  system  energy  either  decreases  or  remains  the  same  and  a  stable  state  corresponds 
to  an  energy  minimum. 

Hopfield  identified  many  useful  properties  of  his  network  that  closely  follow 
from  his  original  premise.  His  networks  have  a  built-in  error  correcting  mechanism, 
since  deviations  from  stable  points  disappear  as  all  points  in  state  space  flow  toward  a 
stable  point.  If  the  network  is  given  incomplete  information,  that  point  in  state  space 
moves  to  a  stable  point  that  appropriately  reconstructs  any  missing  information. 
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Ackley  et  al.  [Ackley85]  developed  a  connectionist  system  called  a  "Boltzmann 
machine."  It  is  an  artificial  neural  network  whose  basic  elements  are  similar  to  those 
used  by  Hopfield  and  is  well  suited  to  constraint  satisfaction  tasks.  Boltzmann 
machines  work  with  weak  constraints  in  the  sense  that  the  best  solution  to  a  problem 
can  violate  some  constraints  and  the  quality  of  the  solution  is  determined  from  the  cost 
or  number  of  the  violated  constraints.  Although  a  Boltzmann  machine  network  is 
similar  to  a  Hopfield's  network,  there  are  several  important  and  interesting  differences. 
Like  a  Hopfield  network,  the  neuron  processing  units  of  a  Boltzmann  machine  are 
binary  and  are  connected  by  weights  that  can  take  on  real  values.  They  add  up  all  their 
inputs  and  compare  the  sum  to  a  threshold.  If  the  value  is  greater  than  the  threshold, 
then  the  neuron  takes  a  value  of  one,  else  the  unit  takes  on  a  value  of  zero. 

As  in  a  Hopfield  network,  the  energy  of  the  system  monotonically  decreases  to 
a  local  minimum,  rather  than  a  global  minimum.  Using  Hopfield's  simple  updating  rule, 
the  network  will  get  caught  in  local  minimums,  which  is  the  case  with  gradient  decent 
and  hill-climbing  algorithms.  Being  stuck  in  a  local  minimum  is  not  a  problem  with 
Hopfield  networks  since  they  are  associative  memory  models  used  to  store  items  at  the 
local  minimums.  Local  minimums,  however,  cause  problems  for  constraint  satisfaction 
tasks  since  the  quality  of  the  solution  suffers  if  the  network  settles  at  a  local  rather 
than  a  global  minimum. 

Ackley  et  al.  proposed  that  a  simple  way  to  get  out  of  a  local  minimum  is  to 
occasionally  allow  the  system  to  jump  to  higher  energy  states.  This  process  is 
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stochastic  and  based  on  the  energy  gap  between  a  unit  being  on  or  off  (A£).  The  unit 
is  turned  on  with  a  probability  given  by  the  following  equation: 

1 


P  = 


The  term,  T,  acts  like  temperature,  and  a  system  of  units  will  eventually  reach  "thermal 
equilibrium"  as  the  temperature  is  lowered.  The  relative  probability  of  the  two  global 
states  obeys  the  Boltzmann  distribution  given  as 

Here,  Pa  is  the  probability  of  being  in  the  a*  global  state,  and  Ea  is  the  energy  of  that 

state.  The  P  subscript  terms  represent  the  other  energy  state. 

The  temperature  term  controls  the  sensitivity  of  energy  differences.  At  low 
temperatures,  there  is  a  strong  bias  towards  low  energy  states  but  at  a  high  time  cost 
to  reach  those  states.  Conversely,  at  high  temperatures  equilibrium  is  achieved  faster 
but  at  a  higher  energy  state.  Thus,  similar  to  annealing  in  metals,  starting  at  high 
temperatures  and  progressing  to  lower  temperatures,  gradually  cooling  the  system, 
allows  the  system  to  rapidly  approach  equilibrium.  High  temperatures  allow  a  course 
search  of  the  global  state  space,  and  lower  temperatures  lets  the  system  respond  to 
smaller  energy  differences  within  the  course  minimum  discovered  at  higher 
temperatures.  Both  Hopfield's  network  and  the  Boltzmann  machine  network  provided 
the  basis  for  another  network,  Harmony  Theory,  that  is  discussed  in  the  next  chapter. 
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The  network  dynamics  of  a  Boltzmann  machine  are  effective  in  finding 
minimums  in  the  energy  state  space  for  given  input  patterns.  At  the  time  Ackley's 
paper  was  written,  neural  networks  were  on  the  verge  of  making  a  comeback,  but  only 
if  a  generalized  learning  algorithm  could  be  found.  Minsky  and  Papert  [Minsky69]  had 
shown  that  single  layer  perceptron  networks  were  incapable  of  solving  many 
interesting  but  simple  problems.  In  order  to  solve  these  types  of  problems,  a  network 
must  contain  nonlinear  processing  units  that  are  not  directly  constrained  by  the  input; 
however,  up  until  this  time,  training  multiple-layer  networks  was  impossible.  When  the 
network  produced  wrong  results,  it  was  seemingly  impossible  to  determine  which  of 
the  many  connection  strengths  were  at  fault. 

Ackley  et  al.  developed  a  learning  algorithm  for  Boltzmann  machine  networks 
that  overcame  this  problem.  The  neuron  units  of  a  Boltzmann  machine  are  divided  into 
two  types.  The  first  type  is  a  set  of  "visible"  units  that  are  the  interface  between  the 
network  and  the  environment.  The  second  type  are  "hidden"  units  that  are  not  fixed 
and  can  represent  more  complex  relationships  about  the  visible  units.  The  definition  of 
learning  for  Boltzmann  machines  involves  matching  probabilities  between  a  given 
external  environment  (input/output  pairs)  and  the  network.  In  essence,  learning 
attempts  to  find  a  set  of  weights  that  is  most  likely  to  have  generated  the  given 
environment,  so  if  the  probabilities  of  states  of  the  network  match  the  probabilities  of 
states  of  the  environment,  then  the  network  accurately  represents  the  environment. 
The  learning  rule  to  adjust  the  weights  to  increase  the  fit  between  the  network  and  the 
environment  is  simple;  however,  the  derivation  is  complex  [Ackley85],  Learning  starts 
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by  letting  the  network  run  to  an  initial  equilibrium  state  such  that  probabilities  of  the 
states  for  each  unit  can  be  estimated.  The  second  step  fixes  the  visible  units  to  take 
appropriate  values  as  specified  by  the  input/output  pairs,  and  values  of  the  probabilities 
of  the  states  of  units  are  again  estimated.  Local  weight  changes  are  then  made 
proportional  to  the  difference  in  the  probabilities  of  the  units  coupled  by  their  linking 
weight.  Although  only  locally  available  information  is  used  to  change  weights,  the 
change  moves  toward  an  optimum  of  a  global  measure  of  fitness.  The  primary 
drawback  is  that  this  is  a  slow  learning  process  since  the  network  must  estimate 
complete  sets  of  probabilities  in  order  to  adjust  weights. 

The  papers  discussed  in  this  section  are  some  of  the  seminal  works  that  have 
stimulated  research  in  this  field  for  almost  half  a  century.  As  can  be  seen,  research  has 
originated  from  many  different,  seemingly  disparate  fields  but  are  unified  in  their 
common  goal:  to  understand  the  workings  of  massively  parallel  networks  similar  to 
biological  neural  networks.  This  concludes  the  historical  perspective,  and  the 
remainder  of  this  chapter  reviews  general  properties  of  artificial  neural  networks. 

The  Neuron 
As  can  be  inferred  from  the  previous  discussions,  artificial  neural  networks  can 
appear  to  be  quite  diverse;  however,  all  these  systems  have  a  great  deal  in  common. 
This  section  identifies  recurrent  themes  and  begins  by  briefly  examining  the  structure 
and  behavior  of  biological  neurons  since  they  strongly  influence  artificial  neural 
networks.  Next,  general  characteristics  of  computational  neurons  are  examined. 
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The  Biological  Neuron 

Artificial  neural  networks  are  biologically  inspired,  and  may  be  considered  as 
just  a  different  level  of  abstraction  of  their  biological  counterparts.  Initially,  many 
researchers  turned  to  artificial  neural  networks  to  help  hypothesize  about  the  brain's 
overall  operation;  however,  in  the  past  ten  years,  even  though  the  functions  of  artificial 
neural  networks  are  often  suggestive  of  human  cognition,  some  network  designers 
have  gone  beyond  biological  knowledge  of  the  brain  and  discarded  biological 

plausibility. 

;  Dendrites 


Synapses 


Biological 
neurons  make  up  a 
nervous  system.  It  is 
estimated  that  humans 
have  more  than  10u 

neurons  with  perhaps  Nucleus 

1015  interconnections  Figure  1 1 :  A  Biological  Neuron 

between  neurons.  Neurons  receive,  process,  and  transmit  electrochemical  impulses.  A 
typical  biological  neuron  is  shown  in  Figure  1 1 .  A  neuron  has  a  well  defined  region 
that  houses  the  nucleus  and  is  known  has  the  cell  body.  Originating  from  the  cell  body 
are  long,  branching  fibers  that  are  divided  into  possibly  several,  shorter  dendrites  and  a 
single,  longer  axon.  Dendrites  extend  from  the  cell  body,  branching  to  provide 
receptive  surfaces  for  signals  from  other  neurons.  The  cell  body  gathers  all  input 
signals  from  its  dendrites,  and  if  the  correct  stimulus  is  received,  then  the  cell  body 
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conducts  a  signal  to  the  axon.  The  axon  extends  from  the  cell  body  and  terminates  in  a 
branching  pattern  called  terminal  fibers.  The  terminal  fibers  connect  to  the  dendrites  of 
other  neurons  through  synapses,  which  transmit  an  electrochemical  signal  from  one 
neuron  to  another. 

Functionally,  a  biological  neuron  is  a  complex,  electrochemical  device.  The 
activity  of  a  neuron  is  measured  in  terms  of  firing  frequency,  which  is  the  number  of 
axon  signals  generated  in  a  constant  time  interval.  The  axon  signals  are  continuously 
valued.  When  the  signals  reach  the  synapses,  they  are  transmitted  to  neighboring 
neurons  through  a  chemical  transmitter.  Given  a  series  of  input  signals,  a  neuron  can 
be  either  excitatory  or  inhibitory.  An  inhibitory  action  suppresses  the  transmission  of  a 
signal;  whereas,  an  excitatory  response  continues  transmission  of  a  signal.  Some 
neurons  are  capable  of  transmitting  many  different  types  of  signals  via  different 
chemical  transmitters  and  potentials. 

The  Computational  Neuron 

Computational  neurons2  mimic  the  simplest  capabilities  of  biological  neurons. 
Although  there  are  many  types  of  artificial  neurons,  this  section  describes  the  basic 
functioning  of  artificial  neurons.  Computationally,  the  functioning  of  artificial  neurons 
can  be  divided  into  the  following  three  stages: 

1 .  Input 

2.  Activation 

3.  Output 


Computational  neurons  are  also  called  processors  or  units. 
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At  the  highest  level  of  abstraction,  a  computational  neuron  receives  a  set  of  inputs,  X, 
representing  output  from  other  neurons  or  possibly  input  from  the  environment. 
Dendrites  perform  this  task  in  a  biological  neuron.  The  set  of  inputs  are  multiplied  by 
corresponding  weights,  W,  which  represent  synaptic  strengths.  These  results  may  be 
passed  through  a  function,  <J).  The  result  to  this  point  is  called  the  input  to  a  neuron. 
These  input  is  passed  to  an  activation  function,/,  to  determine  the  level  of  activation, 
S,  for  that  neuron.  In  a  biological  neuron,  this  occurs  in  the  cell  body.  The  activation 
level,  S,  may  then  go  through  an  output  function,  g,  producing  the  output  value,  F, 
which  is  passed  on  to  all  other  connected  neurons.  Many  computational  neurons 
simply  output  the  result  of  the  activation  function."  This  is  similar  to  the  function  of  an 
axon.  Figure  12  illustrates  this  general  concept. 


Input  Stage  Activation  Stage  Output  Stage 

Figure  12:  Computational  Neuron 


3  Since  many  computational  neurons  use  the  result  of  the  activation  stage  as  their 
output,  the  neuron's  output  is  sometimes  called  the  activation  or  activation  level  of  a 
neuron. 
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Input  stage 

Although  there  are  no  restrictions  as  to  the  type  of  function  that  a  neuron  may 

use  at  the  input  stage,  most  employ  a  simple  combination  of  the  inputs  and  the 

weights.  The  function,  <f>(x„  Wj),  typically  multiplies  each  input  value  by  the 

corresponding  weight  as 

n 

"  =  <Kx.>w-)  =  Hwrx, 
1=1 

where  n  is  the  number  of  inputs  to  a  neuron.  Variations  on  this  theme  exist  and  have 

been  employed  in  some  networks. 

Activation  stage 

The  activation  state,  S,  of  a  neuron  is  indicative  of  that  neuron's  contribution 

to  the  state  of  the  network  at  a  particular  time  in  response  to  some  input. 
Computational  neurons  can  only  take  on  a  single  activation  value  at  any  time,  and 
typical  computational  neurons  are  limited  in  the  range  of  values  they  can  achieve.  The 
activation  of  biological  neurons  are  more  complex,  but  they  also  appear  to  have  a 
limited  activation  range. 

The  simplest  type  of  computational  neurons  are  the  binary  type.  The  activation 
of  these  types  of  neurons  are  limited  to  two  values  in  the  set, 

S  =  {0,1}  or  S  ={-U}, 
where  the  value  of  1  indicates  that  the  neuron  is  active  [McCulloch43,  Widrow60, 
Hopfield82]. 


82 

Continuous  neurons,  which  take  activation  values  from  the  complete  set  of  real 
numbers, 

5  =  9?, 
are  on  the  other  end  of  the  spectrum  from  binary  neurons.  Subsets  of  this  type  of 
neuron  are  those  that  take  on  real  values  from  a  bounded,  closed  interval  such  as 

S  =  [0,l]orS  =  [-l,l]. 

Several  networks  [Kohonen72,  Anderson72,  Ackley85]  use  these  types  of  neurons. 

The  activation  values  are  arrived  at  by  passing  the  results  from  the  input  stage 
through  an  activation  function.  Two  classes  of  activation  functions  are  employed, 
depending  on  the  type  of  network.  These  classes  are  divided  into  stochastic  and 
deterministic  activation  functions.  Stochastic  activation  functions  compute  the 
probability  that  a  neuron  will  have  an  activation  value  based  on  some  probability 
distribution  function,/,  the  current  input  to  the  neuron,  u,  the  previous  activation  state, 
St-i,  and  possibly  a  global  parameter  analogous  to  temperature,  T. 

p(S)  =  f{u,St_1,T) 
A  common  probability  distribution  function  is  the  Boltzmann  distribution, 

p(u)  =  e{'U/T\ 
The  global  parameter,  T,  has  an  important  affect  on  the  shape  of  this  function,  and 
most  stochastic  networks  vary  T  based  on  an  annealing  schedule.  German  and  German 
[German84]  showed  that  the  rate  of  temperature  reduction  in  the  annealing  schedule 
should  be  proportional  to 
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7X0  -  T° 


log(l  +  r)* 


where  T0  is  the  initial  temperature  and  /  represents  time.  Using  this  annealing  schedule, 
the  neuron  activations  will  gradually  settle  onto  extremes  that  represent  desirable 
network  states  [Hopfield82,  Ackley85];  however,  as  can  be  seen  from  Figure  13,  this 
equation  predicts  very  slow  cooling  rates.  As  a  result,  both  running  and  training 
stochastic  networks  can  take  impractical  periods  of  time. 


Deterministic 
activation  functions  may  be 
subclassed  into  three 
categories, 
1 .    linear, 


T(t) 


2.  thresholding,  and 

3.  squashing. 
The  squashing  class  is 

predominant  since  it  can 

emulate  the  others  by  varying  its  parameters.  The 

most  common  linear  activation  function  is  the 


Figure  13:  German's  Annealing  Schedule 

As 
1 


identity  function,  which  passes  the  total  input  to 
connected  neurons.  Thresholding  activation 
functions  require  a  specified  threshold  value,  9,  to 
compare  the  total  input,  u.  There  are  several  types 


Figure  14:  Step  Function 


84 


of  thresholding  activation  functions.  The  most  common  are  the  step  function  (Figure 

14): 

.      ,     f-1,  ifi/<0 
[1,  otherwise 

and  the  linear  threshold  function  (Figure  15): 


/M,A) 


-1,  ifu<0, 

1 


9,-0, 


(-2u  +  0,  +02),  if<9,  <w<02 


1,  ifu>0, 


Squashing  functions  are  used  in 
those  networks  where  an  unbounded  set  of 
real  values  must  be  mapped  to  a  bounded 
set.  The  most  common  is  the  sigmoid 
function: 


/(«)  = 


1 


1  +  e 


> 


Figure  15:  Linear  Threshold  Function 


where  larger  values  of  P  produce  a  step  like  function  (Figure  16)  and  smaller  values 
cause  the  sigmoid  to  behave  like  a  linear  function  (Figure  17). 


"10 


Figure  16:  Step  Sigmoid  Function  Figure  17:  Linear  Sigmoid  Function 
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Output  stage 

Networks  can  use  any  function  as  an  output  function,  g(s),  however,  most 

networks  simply  pass  along  their  activation  value  to  connected  neurons.  To  reflect  the 

lack  of  a  specific  neuron  output  function,  the  output  of  a  neuron  is  often  called  its 

activation  or  activation  level. 

Networks  of  Neurons 
Both  computational  and  biological  neurons  by  themselves  are  incapable  doing 

much  more  than  turning  themselves  on  or  off.  As  an  example,  consider  a  single  neural 

receptor  in  a  retina.  As  an  individual  neuron,  it  is  incapable  of  recognizing  a  familiar 

visual  scene;  however,  organized  with  millions  of  other  neurons,  we  are  able  to  recall 

well-known  places.  Thus,  it  is  clear  that  the  information  processing  power  of  neural 

systems  does  not  need  to  come  from  using  complex  neurons  but  from  the  aggregate 

activity  of  a  network  of  neurons.  Rumelhart  et  al.  [Rumelhart86b]  characterize  this 

important  feature  as,  "all  the  knowledge  is  in  the  connections"  (p.  75),  and  the  primary 

research  focus  in  artificial  neural  networks  is  in  the  behavior  of  these  systems,  not  in 

developing  complex  neurons. 

There  are  three  fundamental  factors  in  determining  what  an  artificial  neural 

network  can  accomplish.  These  are 

1 .  the  structure  or  network  topology, 

2.  the  representational  scheme,  and 

3 .  network  dynamics. 
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In  order  to  use  neural  networks  to  solve  practical  problems,  issues  of 
representation  must  be  addressed  since  neural  models  must  interact  and  represent  with 
their  environment  in  some  way.  The  topology  of  a  network,  with  reference  to  its 
organization,  strongly  influences  the  representational  capabilities.  Therefore,  network 
users  must  not  only  choose  how  they  pose  a  problem  and  interpret  results,  but  also  the 
structure  of  the  network  in  order  to  correctly  encode  features  of  the  environment  that 
are  vital. 

Neural  networks  respond  to  external  input.  The  dynamics  of  this  response  are 
implicitly  time  related.  That  is,  the  state  of  individual  neurons  change  over  time  in 
response  to  the  activation  of  other  connected  neurons.  The  result  of  processing  is  an 
equilibrium  condition  on  the  state  of  all  neurons.  The  dynamics  of  processing  greatly 
effect  the  capabilities  of  these  networks.  This  section  provides  a  conceptual  and 
mathematical  framework  for  assemblies  of  simple  neurons  and  aspects  of  their 
dynamics. 

Network  Structure 

The  structure  of  a  neural  network  is  defined  by  a  finite  set  of  neurons  and  the 
connectivity  between  neurons.  The  set  of  neurons  is  usually  homogeneous  with  respect 
to  their  computational  characteristics.  The  size  of  a  network  is  simply  the  number  of 
neurons.  Mathematically,  sets  of  neurons  are  can  simply  be  represented  by  vectors  of 
activation  values. 

Connectivity  between  neurons  is  represented  by  weight  matrices,  wy  The 
indices  of  the  coefficients  indicate  both  the  direction  of  the  connection  and  what  two 
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neurons  are  connected.  The  row  index,  i,  indicates  the  index  of  the  neuron  from  which 
a  signal  may  emanate,  and  the  column  index,  j,  denotes  the  index  of  the  neuron  that 
will  receive  the  signal.  The  coefficients  are  normally  real  values  with  positive  numbers 
representing  excitatory  connections  and  negative  values  indicating  inhibitory 
connections.  When  two  neurons  are  not  connected,  then  the  corresponding  weight  is 
zero.  When  a  network  requires  both  feedforward  and  feedback  connections  between 
neurons,  a  square  weight  matrix  results  and  oftentimes  the  strength  of  the  connection 
must  be  the  same;  therefore,  a  symmetric  square  matrix  arises.  Weights  along  the 
diagonal  act  as  biases. 

Identifying  layers  of  neurons  is  a  convenient  way  to  help  analyze  networks. 
Identification  of  layers  is  done  according  to  the  direction  of  connections  among 
neurons.  Multiple  layer  networks  have  multiple  weight  matrices  that  designate  the 
connectivity  between  connected  layers.  Each  layer  of  neurons  is  represented  by  a 
separate  vector  of  activations. 

Layers  of  a  network  are  differentiated  according  to  their  visibility  to  the 
outside  environment.  Neurons  in  a  layer  that  directly  receives  input  from  the 
environment  form  the  "input  layer."  Neurons  that  represent  the  results  of  processing 
are  organized  into  a  layer  called  the  "output  layer."  All  other  neurons  are  known  as 
"hidden  neurons"  and  may  be  organized  into  one  or  more  "hidden  layers,"  depending 
upon  their  connectivity. 


A  neuron  bias  is  applied  as  an  offset  to  the  origin  of  the  activation  function, 
producing  a  threshold  effect. 
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Distributed  and  Local  Representations 

Representational  issues  are  divided  into  internal  and  external  interpretations. 

External  representations  are  dealt  with  in  the  following  chapters  concerning  specific 

networks  and  problem  domains.  In  this  section,  we  inspect  internal  representations  and 

information  processing  of  neural  networks. 

Considering  an  assembly  of  neurons,  neural  processing  is  a  change  in  the 
activation  states  or  activation  levels  of  those  neurons.  We  can  interpret  these  patterns 
of  activation  as  either  a  local  representation  or  as  a  distributed  representation 
[Hinton86].  In  local  representations,  each  neuron  represents  a  concept.  In  distributed 
representations,  concepts  are  represented  by  specific  patterns  of  activity  distributed 
over  a  set  of  neurons.  Each  neuron  may  be  part  of  the  activation  pattern  consistent 
with  more  than  one  concept.  Thus  a  neuron  instead  of  representing  an  entire  concept, 
represents  a  "microfeature"  of  a  concept.  Connection  strengths  between  microfeatures 
stand  for  "microinferences"  [Hinton86,  page  80]. 

Local  representations  make  understanding  what  the  network  is  doing  much 
easier  since  the  state  of  activation  of  each  neuron  has  a  definite  meaning.  It  is  also 
easier  to  hand-craft  the  structure  of  the  network  including  weight  magnitudes.  On  the 
other  hand,  distributed  representations  have  a  number  of  compelling  features  that 
make  them  attractive.  Using  distributed  representations  makes  a  network  more  robust, 
improves  generalization  capabilities,  and  gives  a  network  the  ability  to  generate  new 
concepts.  Since  each  neuron  depicts  a  microfeature,  noise  or  a  damaged  neuron  will 
have  little  effect  on  the  performance  of  the  network  since  the  network  represents  any 
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concept  via  many  neurons.  It  is  entirely  conceivable  that  a  damaged  network  will  still 
be  able  to  perform  adequately.  A  network's  generalization  ability  may  be  improved 
since  microfeatures  may  be  combined  in  new  patterns  of  activation  and  thus  represent 
new  concepts. 

Distributed  representations  are  not  without  caveats.  Most  distributed 
representations  are  more  efficient  in  terms  of  memory  usage  since  each  neuron  may  be 
part  of  several  patterns;  however,  distributed  representations  do  not  guarantee  this.  In 
addition,  different  concepts  cannot  be  represented  in  a  network  at  the  same  time,  but 
using  subnetworks  can  eliminate  this  problem.  Perhaps  the  most  difficult  problem  is  in 
clarifying  the  relationship  between  knowledge  stored  in  the  activation  patterns  of  a 
network  using  distributed  representations  and  traditional  knowledge  representation 
schemes.  Using  distributed  representations  makes  identification  of  what  knowledge 
has  been  stored  in  the  network  extremely  difficult. 

Network  Dynamics 

Network  dynamics  are  divided  into  two  processes,  activation  and  learning 

dynamics.  Each  is  implicitly  a  function  of  time,  which  is  discretize  and  abstracted  into 
steps,  epochs,  or  cycles.  Activation  dynamics  involve  determination  of  the  activation 
level  of  all  neurons  in  an  assembly.  Activation  dynamics  are  typically  very  fast  and 
analogous  to  biological  neural  assemblies  where  total  required  computation  time  is  in 
the  10  millisecond  range.  Learning  dynamics  involve  changing  the  synaptic  weights  of 
the  assembly,  which  requires  more  time  and  can  take  on  the  order  of  weeks  to 
accomplish. 
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Activation  dynamics  involve  a  change  in  the  pattern  of  activation  of  neurons  in 
a  network  assembly.  As  mentioned  in  the  previous  section  on  the  computational 
aspects  of  neurons,  determining  the  activation  level  of  a  single  neuron  involves  the 
following  three  steps: 

1.  Input  processing,  u  =  ^(x(/),w) 

2.  Activation,  S  -  /(u) 

3.  Output,  F  =  g(S) 

Each  neuron  computes  its  activation  independently,  in  a  time  sense,  of  other 

neurons.  They  compute  their  activation  based  only  on  the  output  provided  by 
connected  neurons.  In  keeping  with  biological  plausibility,  most  networks  follow  some 
form  of  a  parallel  update  schedule  in  that  many  if  not  all  neurons  simultaneously 
change  their  states.  Parallel  update  schedules  can  be  synchronous  or  asynchronous. 
For  hierarchical  networks,  where  neurons  are  grouped  into  layers,  synchronization  of 
processing  normally  requires  that  neurons  in  a  layer  wait  until  neurons  in  the  preceding 
layer  compute  their  outputs.  This  is  a  form  of  synchronous  updating  since  layers  or 
groups  of  neurons  compute  their  output  simultaneously  while  all  other  neurons  remain 
fixed.  Computations  depend  on  the  previous  time  step's  activations,  and  the  network 
propagates  new  activations  only  after  all  neurons  in  the  group  complete  their 
computations.  Asynchronous  updates  allow  neurons  to  change  their  states 
independently  and  simultaneously.  Asynchronous  updates  do  not  account  for  the 
possibility  that  computations  might  not  be  up  to  date  and  be  based  on  old  neuron 
activations  that  might  be  in  the  process  of  changing. 
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The  importance  of  learning  dynamics  becomes  clear  when  considering  that  the 
activation  dynamics  are  controlled  by  the  connections  between  neurons,  represented 
by  the  weight  matrices,  w9.  There  are  two  methods  for  establishing  the  values  in 
weight  matrices  and  the  method  utilized  depends  on  the  internal  representational 
scheme,  problem  complexity,  available  computing  power,  and  time.  These  are 

1 .  by  hand  and 

2.  algorithmically. 

When  local,  internal  representations  are  used  in  a  neural  model,  specifying  the 

connection  weights  by  hand  is  entirely  feasible,  providing  the  network  is  not  immense.5 
Using  local  representations,  each  neuron  represents  a  single  concept  or  entity,  and  it  is 
possible  to  determine  the  relationship  between  neurons  on  the  basis  of  the  desired 
concepts  and  associations  to  be  present.  Those  neurons  that  represent  conflicting 
concepts  are  connected  with  inhibitory  connections  (e.  g.,  wv  =  -1);  those  neurons 
symbolizing  supporting  concepts  are  linked  with  excitatory  connections  (e.  g.,  wv  =  1). 
Unrelated  neurons  are  not  connected  (e.  g.,  w,y  =  0).  The  magnitude  of  the  connections 
are  typically  constant  throughout.  Manual  methods  of  developing  connection  strengths 
are  well  suited  to  those  domains  that  are  well  structured  since  relationships  between 
concepts  (neurons)  need  to  be  well  defined  and  understood. 

Algorithmic  learning  methods  are  employed  when  distributed  representations 
are  used,  the  problem  is  large,  or  the  domain  is  complex.  There  are  two  classes  of 


For  hand  crafted  weight  matrices,  what  is  meant  by  a  large  network  is  subjective 
depending  upon  a  developer's  patience  threshold.  Typically,  networks  with  tens  to 
hundreds  of  connections  may  use  manual  methods  of  setting  up  weights. 
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algorithmic  learning  methods,  either  supervised  or  unsupervised.  Most  algorithmic 
learning  methods  evolved  from  the  concepts  first  presented  by  Hebb  [Hebb49].  These 
methods  tune  the  weights  of  a  network  in  such  a  way  that  the  networks  performs  in  as 
desired  for  a  given  set  of  input  patterns.  Learning  procedures  are  generally  iterative 
and  involve  a  two  step  process.  The  first  step  of  the  iteration  is  to  determine  the 
current  state  of  activation,  F,  of  the  network  in  response  to  a  set  of  input  patterns,  «„ 

where  n  represents  the  number  of  input  patterns.  For  supervised  learning,  F  is  used  in 
combination  with  a  set  of  desired  target  responses,  T,  to  calculate  a  measure  of  the 
quality  of  learning,  E, 

Ef  =E(ui,Fi,Ti) 

The  second  step  in  the  iterative  process  requires  computing  the  necessary 
change  in  the  connection  weights,  W,  to  increase  E: 

AW  =  0(W,/7,E,E'), 
where  r)  is  a  matrix  of  updating  rate  parameters,  and  E  is  a  matrix  of  derivatives  of  E. 
Once  AW  is  calculated,  the  weights  are  updated  and  a  new  iteration  starts.  The 
iteration  continues  until  E,  the  quality  measure  of  learning,  is  acceptable.  Learning 
dynamics  assume  that  changes  in  the  weights  do  not  effect  the  activation  dynamics 
within  an  iteration. 

Both  classes  of  learning  processes  essentially  perform  a  search  in  a  multiple- 
dimensional  weight  space  for  an  extreme  point  of  E.  What  makes  learning  a  difficult 
process  is  that  most  weight  spaces  are  full  of  local  minimums,  some  of  which  can  be 
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deep  and  difficult  to  emerge  from.  Different  learning  processes  are  in  essence  different 
types  of  search  procedures.  Both  supervised  and  unsupervised  learning  procedures 
have  limitations  and  research  continues  to  find  faster  and  more  reliable  training 
algorithms. 

The  performance  of  a  training  algorithm  is  dependent  upon  the  training  set,  the 
input  and  possibly  output  patterns  presented  during  learning.  In  general,  the  training 
set  should  fully  and  accurately  represent  the  problem  domain.  Features  and  concepts 
that  are  important  must  be  either  explicitly  or  implicitly  encoded  in  the  training  set; 
otherwise,  the  network  will  not  be  able  to  represent  them. 

Summary 
All  artificial  neural  networks  perform  essentially  the  same  function,  a  vector 
mapping.  That  is,  they  take  an  input  pattern  and  produce  an  output  pattern.  An 
artificial  neural  network  encodes  these  mapping  relationships  via  a  learning  process. 
Different  neural  networks  vary  greatly  in  the  range  of  mappings  that  they  can 
represent,  and  some  networks  are  quite  general  in  their  mapping  abilities.  The  ability  to 
map  conceptually  unifies  all  networks  at  a  high,  abstract  level;  however,  there  are 
many  other  appealing  properties  that  most  of  these  networks  share.  These  are  learning, 
generalization,  and  self  organization. 

Learning 

The  ability  of  a  network  to  learn  from  examples  (i.  e.,  experience)  has  created  a 
great  deal  of  interest  in  artificial  neural  networks.  However,  no  single  learning  process 
appears  ideal.  There  are  two  classes  of  learning  processes 
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1 .  Supervised  —  where  each  input  pattern  is  paired  with  a  target  output  pattern. 
Together  they  form  a  training  pair. 

2.  Unsupervised  ~  does  not  require  target  output  patterns,  but  iterates  until  a 
consistent  set  of  output  patterns  results.  The  effective  results  are  that  this 
process  extracts  statistical  properties  of  the  input  patterns  and  groups  them 
accordingly. 

Generalization 

The  goal  of  learning  is  not  simply  to  reproduce  the  output  patterns  of  the 

training  set.  A  simple  lookup  table  would  perform  this  task  and  not  require  a  training 

algorithm.  We  want  a  network  to  also  generalize  correctly.  Generalization  is  an  ability 

to  produce  correct  output  patterns  from  input  patterns  that  are  not  part  of  the  training 

set,  and  for  the  network  to  be  relatively  insensitive  to  minor  variations  in  the  input. 

Generalization  allows  a  neural  network  to  accommodate  variability,  producing  a 

correct  output  pattern  despite  significant  deviations  from  the  training  set. 

Self  Organization  of  Knowledge 

Artificial  neural  networks  do  not  explicitly  store  nor  create  knowledge.  A 
neural  network  learns  knowledge  and  represents  that  knowledge  in  the  connection 
strengths  between  processors.  Although  the  topology  of  a  network  is  static  in  most 
implementations,  the  network  self  organizes  the  learned  knowledge  through  the 
learning  process. 


HARMONY  THEORY 

This  chapter  introduces  a  particular  stochastic  neural  network  paradigm, 
Harmony  theory  networks,  that  is  well  suited  to  solving  constraint  satisfaction 
problems.  This  chapter  begins  by  giving  an  overview  of  how  neural  networks  function 
as  constraint  satisfaction  systems  and  then  proceeds  to  introduce  the  basics  behind 
Harmony  theory  networks.  The  types  of  constraint  satisfaction  problems  that  Harmony 
theory  networks  solve  in  this  work  are  derived  from  qualitative  reasoning  about 
preliminary  designs  A  basic  qualitative  analysis  system  using  Harmony  theory 
networks  is  developed  in  this  chapter.  The  qualitative  reasoning  system  is 
demonstrated  in  a  series  of  preliminary  structural  design  examples  presented  at  the  end 
this  chapter. 

Constraint  Satisfaction 
Given  a  preliminary  design  problem  that  is  at  least  partially  defined  by  a  set  of 
functional  specifications,  there  commonly  exist  a  large  number  of  feasible,  valid 
designs  that  meet  minimal  functional  specifications.  There  are  usually  far  fewer  that  are 
"good"  designs  in  terms  of  functional  performance;  however,  this  set  may  still  embody 
a  large  number  of  alternatives.  Satisfying  each  functional  requirement  in  turn  does  not 
guarantee  a  "good"  design,  nor  does  it  give  any  assurance  that  a  majority  of  the 
requirements  will  be  met.  Often  a  design  process  will  have  increasing  difficulty  as  a 
design  progresses  using  this  sequential  strategy  since  satisfying  each  requirement  in 
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turn  without  consideration  of  other  requirements  can  make  satisfying  other 
requirements  difficult.  If  the  requirements  are  able  to  simultaneously  and  mutually 
constrain  and  guide  one  another,  then  there  is  an  increased  possibility  that  "good" 
design  candidates  will  result.  Such  a  simultaneous  system  explores  alternatives  without 
committing  to  a  particular  one  until  all  requirements  are  taken  into  account  to  some 
degree.  Artificial  neural  networks  support  simultaneous  consideration  of  multiple 
constraints  and  thus  intrinsically  exploit  a  "least  commitment"  problem  solving 
strategy.  The  design  examples  in  this  chapter  will  demonstrate  this  capability  of 
artificial  neural  networks.  The  following  paragraphs  describe  constraint  satisfaction 
problems  with  respect  to  design  and  how  artificial  neural  networks  can  be  used  to 
solve  constraint  satisfaction  problems. 

A  constraint  satisfaction  problem's  solution  encompasses  the  simultaneous 
satisfaction  of  a  large  number  of  constraints.  Oftentimes  there  is  no  perfect  solution 
where  all  the  constraints  are  completely  satisfied.  Requirements  in  preliminary  design 
problems  can  influence  and  bind  other  requirements  and  design  variables  such  that  the 
design  requirements  can  be  characterized  as  constraints.  For  example,  a  low  weight 
requirement  can  acts  as  a  constraint  on  the  type  of  material  selection  and  volume  of 
material  used  in  a  design.  Design  problems  typically  involve  large  numbers  of 
conflicting  constraints.  Considering  requirements  for  both  low  weight  and  small 
displacements,  by  increasing  the  amount  of  material  the  displacements  will  generally  be 
reduced,  but  the  overall  weight  would  increase.  Thus  constraints  on  weight  and 
displacement  adversely  affect  satisfying  both  requirements  without  some  trade-off. 
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When  all  constraints  cannot  be  perfectly  satisfied,  a  solution  would  involve  the 
satisfaction  of  as  many  constraints  as  possible. 

Constraint  satisfaction  problems  made  up  from  a  number  of  variables  are 
presented  to  an  artificial  neural  network  such  that  each  neuron  (or  a  group  of 
neurons)  represents  a  possible  solution  value  for  a  variable.  Connections  between 
neurons  and  groups  of  neurons  represent  constraints  on  relations  between  variable 
values.  Connections  can  either  be  positive  (for  supporting  states)  or  negative  (for 
inhibitory  states). 

The  activation  dynamics  for  constraint  satisfaction  neural  networks  follow  the 
typical  pattern  for  updating  each  neuron's  output.  Input  to  a  neuron  is  calculated 
based  on  the  output  of  connected  neurons  and  the  strength  of  those  connections.  A 
neuron  then  processes  its  total  input  through  an  activation  function,  which  yields  the 
neuron's  activation  value.  The  activation  value  is  used  as  output  or  processed  through 
an  output  function  before  being  used  as  output.  This  neuron's  output  then  becomes 
the  input  to  all  the  neurons  that  are  connected  to  it.  A  neuron's  output  is  also  called  its 
activation  level  or  value.  The  input  to  some  neurons  in  a  network  is  affected  by  the 
external  environment.  In  addition,  networks  that  are  used  to  solve  constraint 
satisfaction  problems  typically  have  feedback  connections  such  that  the  network 
activation  dynamics  are  not  one  way  from  the  input  neurons  to  designated  output 
neurons.  Instead,  activations  dynamics  proceed  in  "waves"  initially  from  input  neurons 
towards  output  neurons  and  then  back  from  output  neurons  towards  input  neurons. 
These  dynamics  typically  proceed  for  a  set  number  of  neuron  updates. 
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In  design  problems,  an  initial  design  task  is  given  (e.g.,  design  a  minimum 
weight,  stable  structure  to  support  a  given  set  of  forces  using  a  set  of  support 
locations).  These  starting  conditions  are  given  to  the  network  and  activate  appropriate 
neurons  for  that  initial  state.  The  activation  values  from  these  neurons  are  propagated 
through  the  network  via  the  connections  producing  input  to  other  connected  neurons. 
Those  neurons  that  support  the  initial  conditions  would  then  activate  and  produce 
outputs  that  would  propagate  further  through  the  network  as  input  to  other  connected 
neurons.  This  sequence  of  input,  activation,  output,  and  propagation  continues  for  a 
set  number  of  neuron  updates  such  that  all  neurons  that  produce  output  will  eventually 
be  activated.  A  constraint  satisfaction  problem  is  solved  when  the  output  activations  of 
a  particular  set  of  neurons  representing  the  best  solution  are  at  their  maximums.  The 
neural  network  does  this  by  maximizing  the  degree  of  constraint  satisfaction  of  each 
processor.  Let  Sj  refer  to  the  activation  (output)  of  neuron^;  let  wv  refer  to  the 
connection  (constraint)  to  neuron  /  from  neurony,  and  let  input,  refer  to  any  external 
input  to  neuron  i.  The  total  input,  x„  to  neuron  i  is 

x,=2wA+**w'i  (!) 


The  degree  of  constraint  satisfaction 
for  neuron  /  is  defined  as 


degree ■.  =  xiSi 


(2) 


where,  S,  is  the  activation  value  of 
neuron  /'.  The  activation  value  of 
neuron  /  is  a  function  of  its  input,  x,, 


S(x)0.5  ~ 


Figure  18:  Sigmoid  Activation  Function 
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and  is  determined  from  the  sigmoid  activation  function  (Figure  1 8)  defined  as 

S--T^  (3> 

Each  neuron  mutually  constrains  other  neurons  connected  to  it,  and  a  neuron 
with  positive  input,  x„  will  have  a  positive  degree  of  constraint  satisfaction.  If  the 
magnitude  of  a  positive  x,  is  large,  then  the  magnitude  of  S,  is  close  to  1,  resulting  in  a 
that  neuron's  degree  of  constraint  satisfaction  being  high.  A  neuron  with  a  high 
activation  value  will  tend  to  increase  the  activation  values  of  positively  connected 
neurons  and  reduce  the  activation  values  of  negatively  connected  neurons.  When  the 
input,  x„  is  negative,  then  the  activation,  S„  will  be  small  and  the  degree  of  constraint 
satisfaction  for  that  neuron  will  be  low.  In  this  manner,  each  neuron  mutually 
influences  those  neurons  connected  to  it  such  that  the  system  of  neurons  as  a  whole 
tends  to  maximize  the  total  degree  of  constraint  satisfaction  defined  as  follows: 

DEGREE  =  ^degree,  (4) 

n 

The  total  degree  of  constraint  satisfaction  increases  only  until  all  neurons  are 
maximally  activated  which  is  dependent  on  the  number  of  neuron  updates  that  the 
neural  system  is  allowed  to  execute.  Each  neuron's  activation  level  will  increase  until 
the  maximum  value  is  reached  with  respect  to  its  maximum  input,  and  once  this 
occurs,  the  neuron  will  never  change  its  local  degree  of  constraint  satisfaction  unless 
the  input  to  the  neuron  changes.  This  process  is  essentially  a  hill-climbing  optimization 
procedure  and  can  only  be  guaranteed  to  find  local  optimum  solution.  In  order  to 
overcome  local  optimums,  thermodynamic  neural  network  models  were  developed 
[Hopfield82,  Ackley85,  Smolensky86]. 
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As  an  example  of  a  constraint  satisfaction  neural  network  consider  a  linear 
spring  model  as  shown  in  Figure  19.  We  can  relate  the  spring  stiffness,  k,  to  the  end 
displacement,  u,  and  a  force,  F,  using  the  following  equation: 

F  =  ku 
We  can  assume  a  simple  relationship  between 
the  spring  stiffness,  k,  the  cross  section  area  of 


the  spring,  A,  and  the  spring  material's  Figure  19:  Linear  Spring  Model 

modulus  of  elasticity,  E,  using  the  following  equation: 

k  =  EA 
As  constraints  on  this  simple  system,  consider  that  when  the  stiffness  increases 
and  the  displacement  increases,  then  the  force  on  the  spring  should  also  increase.  We 
can  represent  this  relationship  with  the  following  equation: 

F+  =u+k  + 
Likewise,  when  the  displacement  and  stiffness  decrease,  then  the  force  must  also 
decrease.  We  can  represent  this  relationship  with  the  following  equation. 

F~  =uk~ 
In  a  similar  manner,  when  both  E  and  A  increase  then  k  also  increases.  When  E  and  A 
decrease,  then  k  decreases.  These  two  relationships  are  represented  by  the  following 
two  equations: 

k+=E+A+  k-=E~A~ 

In  setting  up  the  neural  network,  we  make  each  variable  and  relationship 
between  variables  neurons.  The  constraints  are  the  required  direction  of  change 
between  each  variable  and  the  represented  relationship.  Figure  20  shows  the  network 
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just  described.  The  constraints  are  shown  as  either  a  plus  sign  (+),  representing  a 
connection  strength  of  1,  or  minus  sign  (-),  representing  a  connection  strength  of -1,  in 
the  diagram.  Connections  are  between  variables  and  defined  relationships.  Note  that 
the  sigmoid  activation  function  produces  activation  values  that  range  from  0  to  1 .  For 
illustrative  purposes,  we  will  offset  and  scale  the  activation  function  for  the  neurons 
representing  the  variables  to  range  from  -1  to  1.  Further,  when  measuring  the  degree 
of  constraint  satisfaction  we  will  only  use  the  input  and  activation  terms  of  the 
relationships  (the  lower  neurons  in  Figure  20).  If  we  also  included  the  reciprocal 
constraint  in  determining  the  degree  of  constraint  satisfaction  by  considering  the  input 
and  activation  values  of  the  variables,  we  would  be  considering  the  affect  of  the 
constraint  twice.  Obviously,  there  are  many  more  relationships  that  could  be  used  to 
described  this  system,  but  for  illustrative  purposes,  they  have  been  neglected. 


Figure  20:  Constraint  Satisfaction  Example 
We  would  like  the  network  to  determine  which  relationships  and  variables  will 

be  active  for  a  given  set  of  variable  activation  values.  IfF  and  A  were  to  increase  (F* 

and  A+),  the  network  is  to  determine  using  constraint  satisfaction  which  relationships 

and  variables  will  activate.  For  input  to  this  network,  both  F  and  A  have  external  input 

values  of  1.  The  input  values  for  u,  k,  and  E  are  0  and  the  activations  of  the  lower  four 

neurons  are  initialized  as  0  since  they  do  not  receive  any  external  input.  Based  on  this 

initial  state,  activation  values  for  the  top  five  neurons  are  determined  using  the  scaled 
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and  shifted  sigmoid  activation  function.  These  activation  values  are  multiplied  by  the 
connection  strengths  to  become  input  for  all  connected  neurons.  The  input  values  for 
each  of  the  lower  four  neurons  in  Figure  20  are  from  left  to  right  0.462,  -0.462,  0.462, 
and  -0.462  at  this  point.  These  input  values  are  converted  to  activation  values  by  the 
sigmoid  activation  function.  Thus,  the  activation  values  for  these  same  neurons  are 
from  left  to  right  0.614,  0.386,  0.614,  and  0.386  and  the  system  has  a  degree  of 
constraint  satisfaction  of  0.21.  If  we  allow  the  neural  network  to  continue,  the 
activation  values  of  the  lower  neurons  will  get  propagated  to  the  upper  neurons.  The 
input  for  the  upper  neurons  are  from  left  to  right  1.227,  0.227,  0.454,  0.227,  and 
1.227.  Figure  21  shows  the  activation  values  for  both  layers  of  neurons  after  the  new 
inputs  are  processed  for  the  upper  neurons.  The  two  upper  neurons  with  external 
inputs  are  indicated. 

I1  I1 


Figure  21 :  Activation  Values  After  All  Neurons  Update 
After  a  single  update  cycle,  the  neurons  on  the  top  level  all  have  activations 

closer  to  1  than  - 1  indicating  that  they  should  all  increase,  and  the  two  neurons  with 

activations  closer  to  1  rather  than  0  indicate  which  relationships  appear  valid.  Valid 

relationships  are  indicative  of  satisfied  constraints  of  this  system.  Figure  22  shows  the 

neuron  activations  after  all  neurons  have  updated  six  times.  The  degree  of  constraint 

satisfaction  of  the  system  is  now  1.998.  The  increase  is  due  to  the  stronger  responses 
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of  those  neurons  that  should  be  turned  on  with  respect  to  the  constraints  and  likewise 
neurons  that  should  be  off  are  heading  in  that  direction.  This  trend  continues  as  the 
neuron  activation  values  asymptotically  approach  the  limits  defined  by  the  sigmoid 
activation  function. 


Figure  22:  Activation  Values  After  Six  Update  Cycles 
This  process  is  an  unconstrained  optimization  procedure  where  the  system's 

measure  of  constraint  satisfaction,  DEGREE  (equation  4),  is  the  objective  function, 

and  the  activation  values  of  neurons  in  the  system  are  the  variables.  This  technique 

only  requires  the  function  values  of  the  sigmoid  activation  function,  and  as  the  inputs 

to  a  neuron  gradually  change  the  activation  values  also  change.  This  process  does  not 

use  previous  activation  values  or  changes  to  each  neuron's  input  to  determine  good 

directions  to  maximize  DEGREE.  As  a  result,  the  system  does  not  always  locate  the 

best  maximum  but  instead  finds  a  local  maximum,  which  is  not  necessarily  the  best 

solution. 

To  move  this  system  off  a  local  optimum,  it  is  perturbed  to  move  off  the 

plateau  of  a  local  optimum.  If  the  local  optimum  is  actually  a  global  optimum,  then  the 

system  will  likely  return  to  that  solution  after  being  perturbed  a  small  amount.  The 

process  that  moves  the  system  off  of  local  optimum  plateaus  is  called  simulated 

annealing.  Simulated  annealing  is  analogous  to  annealing  in  physical  systems, 
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specifically  how  metals  and  crystals  form  when  cooled  from  liquid  states.  At  relatively 
high  temperatures,  the  molecules  of  a  liquid  move  freely  with  respect  to  each  other  and 
as  the  liquid  is  cooled  the  atoms  begin  to  arrange  themselves  and  form  crystals.  Crystal 
structures  are  minimum  energy  states  for  the  molecules.  Provided  the  liquid  is  cooled 
slowly  enough  large  crystals  will  form,  but  if  the  liquid  is  cooled  too  quickly,  the 
system  does  not  have  the  defined  structure  of  a  crystal.  The  resulting  amorphous 
system's  energy  state  is  higher  than  that  of  a  crystal's  state. 

A  global  parameter, 
analogous  to  temperature  in 
physical  systems,  is  included 
in  the  sigmoid  activation 
function.  For  high 
temperatures,  the  activation 
values  are  lower  because  the 
temperature  flattens  the 

sigmoid  activation  function.  For  low  temperatures,  the  activation  values  are  higher 
since  the  affect  of  the  temperature  term  steepens  the  function.  Figure  23  shows  two 
plots  of  the  following  sigmoid  activation  function  with  the  temperature  term,  T, 
included: 


S,(x) 


s,o<») 


—  T-l 

—  T  =  10 


Figure  23:  Affect  of  Temperature  on  the  Sigmoid 
Activation  Function 


S  = 


_1_ 
+  e~ 


'A 


(5) 
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where,  x  is  the  input  to  a  neuron  as  defined  in  equation  (1).  Si  in  Figure  23  shows  the 
activation  function  for  a  temperature  value  of  T  =  1,  and  Sio  in  Figure  23  shows  the 
activation  function  for  a  temperature  value  of  T=  10. 

To  make  use  of  simulated  annealing  in  solving  a  constraint  satisfaction 
problem,  the  sigmoid  activation  function  from  equation  (3)  is  replaced  by  equation  (5), 
and  an  annealing  schedule  that  tells  how  the  temperature  will  be  gradually  lowered 
over  the  course  of  a  set  number  of  neuron  updates  is  supplied.  The  duration  and 
magnitude  of  the  annealing  usually  requires  trial  and  error  experiments.  The  activation 
dynamics  procedure  as  initially  described  for  constraint  satisfaction  problems,  but  the 
activation  values  normally  start  with  smaller  magnitudes  due  to  the  initially  higher 
temperature.  As  the  neural  system  continues  to  update,  the  temperature  is  lowered, 
allowing  the  system  to  "cool."  The  activation  values  gradually  settle  onto  a  stable 
optimum  state.  It  is  important  to  realize  that  simulated  annealing  does  not  guarantee 
finding  a  global  optimum,  but  added  temperature  can  perturb  the  network  state,  giving 
the  network  enough  "energy"  to  move  off  a  local  optimum  plateau.  For  constraint 
satisfaction  systems  that  use  binary  or  discrete  activation  values,  simulated  annealing  is 
very  useful  in  moving  off  of  local  optimum  plateaus  that  are  characteristically  flat. 

Harmony  Theory  Networks 
The  basic  mathematics  of  Harmony  theory  networks  are  very  similar  to 
Boltzmann  machine  networks  [Ackley85],  but  the  architecture  and  motivation  behind 
Harmony  theory  networks  are  different.  Smolensky  states  that  the  ultimate  goal  of 
developing  Harmony  theory  networks  is  "to  develop  a  body  of  mathematical  results 
for  the  theory  of  information  processing  that  complements  the  results  of  the  classical 
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theory  of  (symbolic)  computation"  [Smolensky86,  page  195],  Although  Smolensky's 
purpose  is  important,  this  work  attempts  to  exploit  Harmony  theory  networks  as  a 
neural  network  paradigm  that  is  well  suited  to  some  types  of  design  problems. 
Exploring  the  capabilities  of  Harmony  theory  networks  in  the  field  of  design  theory  is 
the  purpose  of  this  chapter.  This  section  reviews  the  basic  operation  and  architecture 
of  Harmony  theory  networks. 

The  architecture  of  Harmony  theory  networks  differs  from  that  of  Boltzmann 
machine  networks  in  that  Harmony  theory  networks  have  two  distinct  layers  of 
neurons,  whereas  a  Boltzmann  machine  network  is  essentially  an  arbitrarily  connected 
network.  Figure  24  depicts  part  of  a  graphical  representation  of  a  Harmony  theory 
network,  showing  the  two  levels  of  neurons.  The  top  level  neurons  are  called 
representational  features  and  the  bottom  ones  are  called  knowledge  atoms.  It  should 
be  noted  that  connections  occur  only  between  levels,  not  within  a  level,  and 
connections  are  symmetric.  The  motivation  for  two  distinct  using  layers  of  neurons 
comes  from  the  Smolensky's  desire 
to  dynamically  activate  knowledge 
atom  neurons. 


Representational  Features 

+\ 


Conventional  knowledge 
structures  used  in  knowledge 
based  systems  are  in  the  form  of 
frames  and  scripts,  which  are 


Knowledge  Atoms 

Figure  24:  Graphical  Representation  of  a 
Harmony  Theory  Network 


normally  fixed  symbolic  descriptions  [Waterman86].  In  contrast,  Harmony  theory 
networks  do  not  presuppose  a  particular  unchanging  depiction  of  the  environment  it 


107 

represents;  it  dynamically  activates  knowledge  atom  neurons  based  on  the  activation 
values  of  representational  feature  neurons. 

The  basic  concept  of  Harmony  theory  networks  is  that  for  any  set  of  given 
representational  features,  a  Harmony  theory  network  will  find  a  set  of  knowledge  atom 
neurons  that  are  "harmonious"  with  the  given  features.  The  measure  of  harmony  of  a 
set  of  activated  knowledge  atom  neurons  is  similar  to  equation  (4).  A  description  of 
how  harmony  is  measured  in  Harmony  theory  networks  follows. 

The  sigmoid  activation  function  defined  in  equation  (5)  and  depicted  in  Figure 
23  is  used  in  Harmony  theory  networks.  The  activation  values  resulting  from  equation 
(5)  are  interpreted  as  the  probability  of  a  neuron  being  active,  and  rather  than 
continuous  activation  values,  harmony  theory  networks  use  binary  activation  values. 
When  the  probability  of  activation  is  equal  or  above  0.5,  then  a  neuron  is  active. 
Representational  feature  neurons  can  take  on  binary  activation  values  of  ±1,  and 
knowledge  atom  neurons  can  take  on  binary  activation  values  of  0  or  1 . 
Representational  feature  neurons  describe  the  state  of  a  situation  based  on  features  of 
the  represented  environment.  If  a  feature  is  present,  then  the  activation  values  of  the 
neurons  representing  that  feature  are  set  to  1,  and  if  the  feature  is  not  present,  then  the 
activation  values  are  set  to  -1.  Knowledge  atoms  neurons  combine  features  into 
configurations  that  are  distinct  relationships  between  features.  When  a  knowledge 
atom  neuron  is  activated  over  the  course  of  activation  dynamics,  the  knowledge  atom 
neuron  asserts  that  the  associated  features  it  requires  are  active.  When  a  knowledge 
atom  neuron  is  not  activated,  it  implies  that  the  connected,  supporting  representational 
features  are  not  active. 
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The  connections  between  the  two  levels  take  on  trinary  values  (-1,  0,  +1). 
When  there  is  a  supporting  relationship  between  representational  feature  neurons  and 
knowledge  atom  neurons,  then  the  connection  values  are  +1.  When  there  is  not  a 
supporting  relationship  between  neurons  at  the  different  levels,  then  the  connection 
values  are  -1.  When  there  is  no  relationship  between  representational  feature  neurons 
and  knowledge  atom  neurons,  then  the  connection  values  are  0. 

For  a  given  Harmony  theory  network,  a  solution  to  a  problem  is  found  by 
fixing  the  inputs,  input,,  to  some  of  the  representational  features  based  on  the  state  of 
the  external  environment  as  represented  by  the  network.  Then,  the  neurons  at  both 
levels  are  randomly  updated.  Representational  feature  neurons  update  by  first 
determining  their  total  input,  xh  using  equation  (1).  Then  they  processes  their  total 
input  using  the  sigmoid  activation  function  from  equation  (5).  This  equation  results  in 
a  probability  of  activation.  When  this  value  is  equal  or  greater  than  0.5  the  activation 
output  value  is  1,  otherwise  the  activation  output  value  is  -1.  Representational  feature 
neurons  propagate  their  output  activation  values  to  the  knowledge  atom  neurons. 
Knowledge  atom  neurons  update  by  first  processing  their  inputs  using  equation  (1) 
and  then  determine  their  probabilities  of  activation  using  equation  (5).  Depending  on 
the  probabilities,  the  knowledge  atoms  output  activation  values  of  0  or  1. 

Each  of  the  knowledge  atom  neurons  is  able  to  determine  its  harmony  value, 
hh  which  is  a  measure  of  how  consistent  the  neuron  is  with  respect  to  its  connected 
representational  feature  neurons.  Each  knowledge  atom  neuron  determines  its 
harmony  value  from 
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h,=^ -K  (6) 

n, 

where,  Wg  is  the  connection  value  fromy  representational  feature  neurons  to  the 

knowledge  atom  neuron  i,  Sj  are  the  activation  values  of  j  connected  representational 

features;  //,  is  a  normalizing  factor,  which  is  the  number  of  non-zero  connections  to 

knowledge  atom  i.  The  parameter  k  acts  as  a  threshold  constant  that  regulates  what 

proportion  of  the  knowledge  atom  neuron's  input,  x„  must  support  the  knowledge 

atom  neuron  before  that  knowledge  atom  has  a  positive  harmony  value.  The  harmony 

value  for  each  knowledge  atom  neuron  is  similar  to  equation  (2). 

The  harmony  of  the  entire  network  is  defined  as 

harmony  =  ^  £,/»,-  (7) 

where,  h,  is  determined  for  each  knowledge  atom  neuron  from  equation  (6),  and  S,  is 
the  binary  activation  value  (0  or  1 )  of  each  knowledge  atom  neuron  /. 

Each  knowledge  atom  neuron's  harmony  value,  h„  is  the  proportion  of 
representational  feature  neurons  that  support  the  knowledge  atom  neuron  minus  the 
proportion  that  is  inconsistent,  minus  the  threshold  constant,  k.  When  k  is  -1,  the 
knowledge  atom  neuron  does  not  require  any  of  the  supporting  representational 
feature  neurons  to  be  activated  for  that  knowledge  atom  neuron  to  contribute 
positively  to  the  system's  harmony.  When  k  is  1,  the  knowledge  atom  neuron  requires 
all  of  the  representational  feature  neurons  to  be  activated  for  that  knowledge  atom 
neuron  to  contribute  positively  to  the  system's  harmony.  When  k  is  0,  the  knowledge 
atom  neuron  will  contribute  positively  to  the  system's  harmony  whenever  the  number 
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of  consistent  representational  feature  neurons  exceeds  the  number  of  inconsistent  ones. 
To  prevent  a  very  weak  matching  criteria  between  h,  and  the  number  of  activated 
consistent  representational  feature  neurons,  k  should  be  greater  than  0.  Smolensky 
suggests  using  the  following  relation  to  determine  good  values  of  k: 

1  >  k  >  1  -  2/n  (8) 

where,  n  is  the  maximum  number  of  non-zero  connections  between  representational 
feature  neurons  and  any  knowledge  atom  neuron  [Smolensky86,  pages  222  -  223]. 

With  a  given  annealing  schedule  that  defines  both  temperature  values  and 
number  of  neuron  updates,  harmony  theory  networks  tend  to  activate  those  knowledge 
atom  neurons  that  are  consistent  with  given  sets  of  representational  feature  neuron 
inputs. 

Smolensky  was  interested  in  exploring  cognitive  theories  in  what  he  calls  the 

"subsymbolic  paradigm"  [Smolensky86,  page  195]  in  such  a  way  as  to  compliment 

existing  symbolic  paradigms.1  Combining  representational  features  such  that  the 

resulting  active  knowledge  atoms  complete  a  static  description  of  the  state  of  the 

environment  is  such  a  link.  Perception  and  logical  reasoning  about  a  static  sensory 

input  (representational  features)  is  a  completion  task  that  applies  to  many  cognitive 

efforts.  Smolensky's  premise  behind  harmony  theory  networks  can  be  stated. 

The  harmony  principle  ...  is  an  engine  for  activating  coherent 
assemblies  of  atoms  and  drawing  inferences  that  are  consistent  with  the 
knowledge  represented  by  the  activated  atoms.  [Smolensky86,  page 
203] 


Newell  [Newell80]  describes  the  mathematical  theory  of  symbolic  computation, 
which  is  problem  solving  based  on  the  application  of  strategies  and  heuristics  to 
manipulate  symbols  representing  problem  concepts. 
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Input 

Create  a  partial  description 
of  the  environment  by  fixing 
some  feature  values. 


Activation 

Activate  knowledge  that  is 
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Inference 

Activate  unknown  features 
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active  knowledge. 


Figure  25:  Completion  Problem 
Solution  Procedure 


Many  cognitive  tasks  require  inference,  and 
completion  tasks  are  one  such  inference  task.  In  a 
completion  task,  a  partial  description  of  a  situation 
is  given  and  the  solution  requires  completion  of  the 
description  by  supplying  the  missing  information. 
Figure  25  shows  the  procedure  for  performing  a 
completion  task.  Initially,  input  values  are  assigned 
to  some  representational  features,  which  results  in 
the  activation  of  some  knowledge  atoms.  Inference 
occurs  when  the  activation  of  knowledge  atoms 
causes  previously  inactive  representational  features  to  become  active.  Activation  on 
both  levels  occur  in  a  consistent  manner  due  to  the  connection  values  between  levels, 
and  as  a  result,  the  activation  and  inference  processes  mutually  constrain  each  other. 

Experts  use  their  experience  to  identify  abstractions  useful  for  problem  solving 
in  their  domains.  These  abstractions  are  the  basis  for  representational  features  in 
Harmony  theory  networks.  Likewise,  experts  must  identify  knowledge  atoms  as 
relationships  between  representational  features,  and  connections  naturally  occur 
between  related  knowledge  atoms  and  representational  features.  Before  connection 
values  can  be  fixed,  a  given  set  of  representational  features  and  knowledge  atoms  are 
divided  into  sets  of  neurons.  The  connections  between  representational  feature 
neurons  and  knowledge  atom  neurons  are  then  set  to  correspond  to  the  relationship 
defined  by  the  knowledge  atom  neurons.  In  the  neural  network  simulator  for  Harmony 


theory  networks  supplied  by  McClelland  and  Rumelhart  [McClelland88],  learning 
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dynamics  that  computationally  set  the  connection  values  are  not  implemented.  As  a 
result,  the  connections  between  neurons  and  values  must  be  hand  coded  in  an  input 
file.  This  is  a  tedious  and  error  prone  process  that  effectively  limits  the  size  and  scope 
of  problems  that  McClelland  and  Rumelhart's  neural  network  simulator  can  represent 
for  Harmony  theory  networks. 

Smolensky  [Smolensky86]  implies  that  Harmony  theory  networks  can  employ 
learning  dynamics  where  the  connection  values  are  set  using  a  supervised  learning 
process.  Supervised  learning  typically  requires  learning  dynamics  that  employ  the 
network's  activation  dynamics,  and  supervised  learning  usually  presents  examples 
thousands  of  times  before  the  connection  values  are  properly  set.  The  time  required  for 
supervised  learning  can  demand  days  of  time  even  when  activation  dynamics  that 
update  each  neuron's  output  cost  fractions  of  seconds.  The  activation  dynamics  of 
Harmony  theory  networks  use  an  annealing  schedule  with  a  fixed  number  of  neuron 
updates,  which  requires  minutes  to  execute  on  most  computers.  As  a  results,  using 
supervised  learning  for  Harmony  theory  networks  appears  impractical  due  to  expected 
extreme  required  training  times. 

Qualitative  Reasoning 
Qualitative  analysis  of  a  design  artifact  typically  determines  how  different 
variables  change  in  relation  to  changes  in  other  variables  without  resorting  to  specific 
numeric  values.  The  reasoning  is  based  on  cause  and  effect  connections  between 
design  features.  Qualitative  knowledge  originates  in  first  principles,  such  as  static 
equilibrium  or  constitutive  laws,  which  are  fundamental  to  a  design  problem  domain. 
Using  first  principles,  human  designers  identify  relevant  design  features  and  variables 
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for  use  with  their  qualitative  reasoning  tasks.  An  automated  preliminary  design 
evaluation  model  should  utilize  these  same  features  and  variables. 

Most  sets  of  design  features  exist  in  a  continuous  domain.  Qualitative  analysis 
of  a  continuous  domain  requires  that  relations  between  typical  design  variables  map 
into  a  discrete  qualitative  set  [Forbus85].  The  simplest  of  these  discrete  sets  includes 
only  increasing,  decreasing  and  stationary  (+,  -,  0)  relations,  and  the  reasoning  is  about 
the  kinds  of  changes  that  can  occur  based  on  physical  laws.  For  example,  Newton's 
second  law  of  motion  (F  =  ma)  expresses  a  relationship  between  force,  mass,  and 
acceleration.  This  relationship  may  be  examined  in  a  qualitative  manner  such  that  if  m 
does  not  change  (0),  and  a  increases  (+),  then  F  must  increase.  Thus,  relationships 
between  variables  are  described  using  the  terms  increase,  decrease  and  stationary. 
Even  though  more  elaborate  qualitative  domains  exist,  it  is  reasonable  to  assume  that 
even  the  simplest  discrete  qualitative  set  will  give  insight  into  the  potential  of  using 
Harmony  theory  networks  for  qualitative  reasoning  tasks. 

Physical  laws  may  express  a  qualitative  relationship  between  features  of  a 
problem  description  and  can  be  viewed  as  constraints  among  these  parameters 
[Kuipers85].  Qualitative  analysis  produces  a  behavioral  description  that  specifies  the 
relationships  and  directions  of  change  between  design  features  based  on  first 
principles.  Agogino  and  Almgren  [Agogino87]  have  shown  the  merits  of  qualitative 
analysis,  but  their  focus  has  been  on  the  detailed,  parametric  design  phase.  This  work 
focuses  on  preliminary  design  and  considers  high  level  abstractions  of  designs 
problems.  These  abstractions  are  similar  to  those  that  human  designers  employ,  which 
they  base  on  physical  laws  and  relationships  between  features  of  those  laws.  The  linear 
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spring  model  in  Figure  19  does  not  consider  all  possible  physical  relationships  such  as 

its  mass  or  length.  These  features  are  directly  ignored  in  the  two  equations  used  to 

characterize  the  spring. 

Building 


Floors 


Building 
Abstraction 


Lumped  Floor 
Masses 


Fixed  Base 


Figure  26:  Building  Abstraction  for  Determining  Natural  Frequencies 
As  another  example  of  an  abstraction  that  designers  might  employ  at  the 

preliminary  design  stage,  consider  the  design  of  a  tall  building.  At  the  preliminary 

design  stage,  we  might  want  to  estimate  its  natural  frequencies  in  order  to  design  for 

cyclic  loads.  Instead  of  detailing  the  entire  structural  frame  in  three  dimensions,  we 

could  abstract  the  structure  as  a  two  dimensional,  fixed  base,  lumped  mass  model  as 

shown  in  Figure  26.  The  abstract  structure  eliminates  three  dimensional  effects, 

distributed  masses,  foundation  modeling,  and  many  more  details,  but  it  gives  designers 

an  indication  of  resulting  natural  frequencies  associated  with  different  stiffness  and 

mass  characteristics.  The  equations  relating  the  natural  frequencies,  stiffness,  and 

masses  resulting  from  the  abstract  representation  of  the  problem  provide  a  basis  for 

performing  qualitative  analyses  for  preliminary  design. 
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Qualitative  analysis  and  reasoning  plays  an  important  role  in  preliminary  design 
by  providing  designers  with  information  about  the  expected  performance  of  design 
artifacts.  Unfortunately,  to  date,  there  has  been  only  limited  success  in  utilizing 
computers  for  this  task.  This  is  due  to  the  limited  success  in  developing  robust, 
generalized  processes  for  working  with  abstractions  in  preliminary  design.  By 
developing  an  automated  qualitative  analysis  system  that  uses  acceptable  abstractions 
of  preliminary  design  problems,  designers  can  explore  the  characteristics  of  their 
design  artifacts  without  having  to  perform  detailed  design  of  the  artifact. 

In  summary,  a  neural  network's  constraints  are  identifiable  from  those  physical 
laws  that  characterize  basic  qualitative  relationships  between  design  variables.  It  will 
be  shown  in  the  following  examples  that  artificial  neural  networks  provide  an  approach 
to  qualitative  analysis  and  reasoning  about  preliminary  designs. 

Examples 
The  following  set  of  problems  illustrate  how  harmony  theory  networks  can 
derive  qualitative  answers  to  problems  associated  with  the  evaluation  and  analysis  of 
preliminary  design  problems.  It  is  assumed  that  specific  requirements  are  given  in  a 
design  problem  description,  and  the  network  is  asked  to  qualitatively  evaluate  a  design 
problem  to  determine  the  best  way  to  meet  those  requirements.  This  evaluation  might 
guide  the  decomposition  of  a  design  problem  into  subproblems,  each  of  which  has 
specific  functional  requirements  as  directed  by  the  qualitative  analysis.  Each  problem 
describes  the  formulation  and  identification  of  representational  feature  neurons  and 
knowledge  atom  neurons  for  each  example.  The  features  and  requisite  knowledge 
abstractions  used  in  these  examples  will  be  familiar  to  most  engineers.  Finally,  each 
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example  describes  its  formulation  and  performance  using  a  Harmony  theory  network 
simulator  from  McClelland  and  Rumelhart  [McClelland88]. 

Structure  1 

A  simple  "abstract"  structural  design 
problem  may  be  stated  as:  "Design  a  stable 
structure,  to  support  a  loading,  such  that  the  Figure  27:  Structure  1 

displacements  are  small  and  the  stresses  are  within  allowable  ranges."  The  simple 
system  representing  a  structure  under  loading  is  shown  in  Figure  27  as  a  spring  model. 
The  spring  stiffness,  k,  is  an  abstraction  of  the  structure's  overall  stiffness,  and  the 
force,  F,  and  displacement,  u,  represent  applied  loads  and  resulting  displacements  of 
the  design.  Presuming  that  adequate  supports  exist  for  stability,  the  following  basic 
physical  principles  (equilibrium  and  constitutive  relationships),  expressed  in  equation 
form,  are  applicable  at  this  level  of  abstraction. 

F  =  k  u  f  =  E- e 

The  above  two  equations  are  a  basis  for  qualitative  relationships  among  important 
abstract  design  variables,  which  are:  stiffness,  k,  displacement,  u,  external  force,  F, 
elastic  modulus,  E,  strain,  e,  and  internal  force,/  It  is  assumed  that  details  of  the 
structure  represented  by  the  abstract  spring  model  will  be  specified  later  in  the  design 
process;  however,  even  an  abstract  design  can  possesses  significant  attributes. 

A  qualitative  analysis  should  support  refinements  in  the  design  such  that  a 
feasible  design  could  be  developed  with  reasonable  assurance  that  it  meets  functional 
specifications.  In  order  to  do  this  using  a  harmony  theory  network,  a  set  of 
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representational  features  and  knowledge  atoms  must  be  identified.  The 
representational  features  should  embody  the  requisite  qualitative  facets.  The  above 
equations  describing  the  abstract  structure  provide  the  basis  for  a  set  of 
representational  features.  Any  other  related  equations  may  also  be  incorporated  in  the 
qualitative  analysis.  More  equations  further  constrain  the  solution  and  usually  make  a 
solution  easier  to  identify.  The  following  equation  describing  the  stiffness  is  therefore 
added: 

k  =  EA 
The  addition  of  this  equation  introduces  another  representational  feature,  the 
abstract  cross  sectional  area,  A.  At  this  level  of  abstraction,  these  equations  embody 
abstractions  a  qualitative  analysis  may  use.  An  additional  abstract  relationship  is 
introduced  to  reduce  the  number  of  required  representational  features 

u 
£  =  I 

This  equation  allows  the  strain,  8,  to  be  replaced  by  u  (assuming  the  L=  1), 

resulting  in  a  total  of  six  features  {k,  u,  F,f,  E,  A).  Reducing  the  number  of  features 

does  not  adversely  affect  the  number  of  constraining  qualitative  relationships  but 

facilitates  the  propagation  of  activations.  The  behavior  of  these  features  needs  to  be 

qualitatively  represented  as  increasing  (+),  decreasing  (-),  or  unchanging  (0).  Two 

binary  neurons  are  assigned  to  each  feature,  one  indicating  the  present  of  change,  the 

other  indicating  the  direction  as  either  increasing  or  decreasing.  Thus,  the  harmony 

theory  network  model  for  this  system  requires  a  total  of  twelve  representational 

feature  neurons. 


118 


Knowledge  atom  neurons  represent  knowledge  about  the  domain:  equilibrium, 
stress-strain,  and  stiffness  equations.  All  possible  relationships  between  features  in 
these  equations  must  be  determined  and  represented  by  a  processor.  For  example,  the 
equilibrium  equation  can  describe  a  number  of  different  relations  between  parameters. 
If  A:  and  u  increase,  then  F  must  also  increase.  Similarly,  if  £  increases  and  u  is 
unchanging,  then  F  will  increase.  If  k  increases  and  u  decreases,  then  F  can  increase, 
decrease,  or  remain  unchanged  since  qualitatively,  we  cannot  determine  the 
magnitudes  of  change.  All  possible  valid  combinations  need  to  be  encoded  into  the 
network.  For  each  three  variable  equation,  there  are  thirteen  legitimate  qualitative 
relationships  as  shown  in  Table  2.  With  three  equations,  this  leads  to  39  knowledge 
atom  neurons.  The  size  of  the  entire  network  is  51  neurons  and  234  connections.  A 
portion  of  this  network  is  shown  in  Figure  24.  The  connections  between  neuron  levels 
symbolize  the  qualitative  constraints  that  comprise  a  portion  of  a  valid  abstract  design 
state. 

Table  2:  Qualitative  Relationships 


Variable 

Direction  of  Change 

X 

+ 

+ 

+ 

+ 

+ 

+ 

0 

0 

0 

- 

- 

- 

- 

X 

y 

+ 

0 

- 

- 

- 

+ 

0 

- 

+ 

+ 

+ 

0 

- 

z 

+ 

+ 

+ 

- 

0 

+ 

0 

- 

+ 

0 

- 

- 

- 

With  the  network  formulated,  it  is  now  possible  to  perform  a  qualitative 
analysis  of  this  structural  system.  The  annealing  schedule  is  shown  in  Figure  28.  The 
annealing  schedule  was  intuitively  developed  from  experience.  The  proportionality 


119 


constant,  k,  for  equation  (6)  is  set  to  0.75.2  The  input  to  this  network  signifies  the 
functional  design  requirements  and  is  as  follows: 

•  The  applied  loading  is  expected  to  be  unchanging  (F^. 

•  A  single  material  will  be  used,  so  the  elastic  modulus  is  unchanging  (£*). 

•  The  displacements  should  be  small,  so  indicate  that  u  as  decreasing  (u). 

•  The  stresses  should  be  small,  so  indicate/ as  decreasing  if). 

Given  this  input,  the  network  updates  the  activations  of  each  neuron  in  a 

random  order.  Figure  29 
shows  the  system's  harmony 
value  as  it  changes  for  each 
cycle.  As  can  be  seen  from 
Figure  29,  the  harmony 
steadily  approaches  a  value  of 
0.75.3  This  indicates  that  the 
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Figure  28:  Annealing  Schedule 


2  The  proportionality  constant  is  determined  from 

1  >  k  >  1  -  2  /  n 

Three  features  associated  with  each  knowledge  atom  and  two  processors  represent 
each  feature;  therefore,  n  =  6,  which  yields 

1  >k>0.67 

When  n  =  6,  Smolensky  suggests  using  k  =  0.75  [Smolensky86,  page  244]. 

3  Since  one  qualitative  equation  for  each  equilibrium,  stress,  and  stiffness  relationship 
should  occur  at  one  time,  the  best  harmony  contribution,  h,  (from  equation  6),  we 
can  expect  for  each  knowledge  atom  neuron  is 


ZV> 


h.  = 


n. 


k  =  ~-  0.75  =  0.25 
6 
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system  is  settling  onto  a  maximum.  The  active  knowledge  atom  neurons  after  300 
cycles  are 


F°=k+u 


f-=E°-u 


k+  =E°-A+ 


The  qualitative  analysis  indicates  that  to  expect  low  displacements  and  stresses,  high 

stiffness  (k+)  and  large  cross 

section  properties  {A*)  should 

occur.  The  stiffness  and  cross 

section  referred  to  are  abstract 

in  sense,  and  these  qualitative 

results  are  expected  to  guide  the 

design  during  refinement  as 

more  details  are  added  to  the  structure. 
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Figure  29:  Harmony  Function  Values 


While  the  program  is  running,  the  three  knowledge  atom  neurons  representing 
the  above  constraints  on  the  representational  features  are  not  continually  activated  due 
to  the  stochastic  nature  of  the  neuron  activations.  Early  in  the  settling  process,  the 
harmony  theory  network  rapidly  activates  and  deactivates  knowledge  atom  neurons. 
This  is  due  to  the  high  initial  temperature,  which  decreases  the  probability  that  any  one 
knowledge  atom  will  remain  active.  As  the  system  is  cooled,  the  probabilities  that 
these  knowledge  atom  neurons  are  active  increases  since  their  individual  harmonies 
contribute  positively  to  the  system's  total  harmony.  The  constitutive  constraint 
(/"  =  E°  ■  u~ )  is  satisfied  first  since  it  is  implicitly  specified  by  the  design 


specifications.  Next,  the  equilibrium  constraint  (F°  =  k*   u  )  is  satisfied,  which 
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induces  the  stiffness  constraint  ( k+  =  E°  ■  A* )  to  settle.  Those  knowledge  atom 
neurons  that  do  not  contribute  to  a  positive  system  harmony  tend  to  become  inactive. 
Interesting  network  behavior  and  challenging  qualitative  analysis  is  seen  when 
a  problem  is  posed  in  a  more  ambiguous  manner.  This  problem  is  revised  and 
presented  to  the  network  as 

•  The  displacements  should  be  small,  so  indicate  u  as  decreasing  (?/") 

•  The  stress  should  be  small,  so  indicate/ as  decreasing  (/). 

Since  F  and  E  are  unspecified,  these  features  do  not  directly  assist  in  constraining 

other  processors,  making  a  consistent,  harmonious  result  more  difficult  to  find.  The 
same  annealing  schedule  was  employed  and  the  system's  resulting  harmony  function 
values  are  similar  to  those  plotted  in  Figure  29  since  the  system  harmony  still  steadily 
approaches  0.75  with  peaks  occurring  near  100,  200,  and  300  cycles.  After  100  cycles 
with  a  system  harmony  of  1.0,  the  following  constraints  were  active: 

F°=k+u~  f-=E+u- 

k+=E+A+  k+=E+A~ 

This  result  is  consistent  with  respect  to  the  input  specifications,  but  the  network 
activated  two  stiffness  constraints,  which  is  inconsistent  since  they  conflict  with  each 
other.  At  this  stage  of  processing,  the  network  could  not  specify  which  stiffness 
constraint  is  best;  however,  both  are  consistent  when  taken  individually,  and  both 
support  the  equilibrium  and  constitutive  constraints.  It  should  be  noted  that  the  system 
harmony  at  this  point  is  greater  than  the  maximum  ideal  system  harmony  of  0.75. 
Since  both  stiffness  knowledge  atom  neurons  are  activated,  they  both  contribute 
positively  to  the  system  harmony,  resulting  in  a  0.5  contribution.  However,  it  should 
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be  expected  that  since  each  stiffness  knowledge  atom  neuron  wants  a  different  area 
representational  feature,  these  two  knowledge  atom  neurons  will  compete.  After  200 
cycles,  with  a  system  harmony  of  0.75,  the  following  constraints  were  active: 
F-=k+u'  f-=E~u  k+=E~A  + 

At  this  stage  both  the  equilibrium  and  stiffness  constraints  changed  from  the  100  cycle 
result,  but  all  constraints  are  still  consistent  with  respect  to  the  input  specifications. 
After  300  cycles,  the  same  set  of  constraints  remain  active  with  a  system  harmony  of 
0.75. 

Structure  2 

This  example  is  similar  in  form  to  previous  one  but  shown  how  the  network 
can  handle  simultaneous  equations  and  larger  problems.  Figure  30  displays  the  abstract 
structure.  It  represents  a  very  high  level  description  of  a  design  problem,  where  the 
design  variables  have  been  chosen  to  embody  a  functional  decomposition  into 
substructures,  ku  k2,  k3,  and  k4.  The  dimensions  of  the  different  substructures  are 
eliminated  in  the  substructure  abstractions,  but  the  connectivity  between  substructures 
is  retained  in  this  model.  A  qualitative  evaluation  will  provide  guidance  for  subsequent 
subproblems  such  that  the  resulting  design  will  satisfy  the  design  requirements. 

Each  substructure  has  an  associated  stiffness,  k„  and  material  property,  E\ 
areas,  A„  provide  a  representation  of  the  amount  of  material  the  resulting  substructure 
design  requires,  and  the  level  of  stress  in  each  substructure  is  represented  by/. 
External  loads  and  displacements  at  critical  junctures  are  represented  by  F,  and  w,, 
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respectively.  Substructures  1,3,  and  4  are  directly  supported,  while  substructure  2  can 

be  supported  by  the  other 

substructures. 

Using  the  direct 
stiffness  method,  the 
following  simultaneous 
equilibrium  equations 
result: 


Figure  30:  Structure  2 


[k]  +k2+k4)ui-k2- u2  =  F, 
-k2   m,  +  (k2  +k3)-u2  =  F2 


(9) 


(10) 


In  the  form  above,  these  equations  are  difficult  to  represent  qualitatively;  therefore, 
each  equation  is  decomposed  such  that  equation  (9)  becomes 


k  +k    =  t 

*1   T  n2         '0 


f„ +**='. 


t,  ■  ii.  =  /. 


k2  ■  a,  =  t, 


h~h=F, 


In  a  similar  manner,  equation  (10)  is  decomposed  into 


V"l    ='4 


hU2  =U 


k2+k,=t5 


te~U-F2 


The  three  constitutive  relationships  for  each  substructure  are 


fl   =  E  ■  U2 


f*=Eux 


The  second  constitutive  relationship  involves  four  variables  and  must  be  further 
decomposed  into 
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d  =  u2-u,  f2=Ed 

The  four  substructure  stiffness  relationships  are 


k}  =  E-  Ax  k2  -  E-  A2 


"j  —  tL>  '  A-±  K 4  —  hi  ■  A^ 


The  above  equations  result  in  25  representational  features  {ku  k2,  k},  k4,  u,,  u2,  F,,  F2, 
E,fi,f2,f3,f4,  A,,  A2,  A3,  A4,  h,  tu  t2,  t3,  U,  ts,  to,  d)  with  the  /,  terms  as  intermediate 
variables  supporting  the  decomposition  of  the  equations.  With  18  equations  of  three 
variables  each  with  thirteen  qualitative  relationships  per  equation  (see  Table  2),  there 
are  234  knowledge  atom  neurons.  The  total  number  of  neurons  for  this  model  are  284 
with  1404  connections.  If  one  knowledge  atom  neuron  for  each  of  the  18  equations 
positively  contributes  its  maximum  to  the  system  harmony,  the  ideal  system  harmony 
will  have  a  value  of  4.5. 

The  annealing  schedule  and  k  values  are  identical  to  those  of  the  previous 
example.  This  model  performs  similarly  to  the  previous  one.  Given  a  well  posed 
problem,  the  network  activates  consistent  constraints.  The  following  test  problem  was 
given  to  the  network: 

•  The  applied  loading  is  expected  to  be  unchanging  (F;  and  F2°). 

•  A  single  material  will  be  used  so  the  elastic  modulus  is  unchanging  (E°). 

•  The  displacements  should  be  small,  so  indicate  them  as  decreasing  {u{  and  u2). 

•  The  stresses  should  be  small,  so  indicate  them  as  decreasing  (fi',f2,fi,  and//). 
The  active  stiffness  constraints  after  300  cycles  are 

k;  =  e°  ■  a;  k;  =  e°  ■  a; 

k;  =  e°  ■  a;  k;  =  e°  ■  a; 
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With  ambiguous  input,  the  network  produces  inconclusive  results,  as  expected  and 
previously  shown.  Even  though  inconclusive  results  emerged,  they  are  consistent  to 
the  highest  degree  applicable  with  ambiguous  input.  All  tests  produced  valid  results 
within  300  cycles  as  defined  by  the  annealing  schedule.  Since  the  neurons  are  updated 
in  a  random  order,  ambiguous  input  will  produce  different  results  when  the  network  is 
run  twice  with  the  same  input,  but  the  activated  knowledge  atom  neurons  will  still  be 
consistent  with  respect  to  the  input  specifications. 

Structure  3 

Figure  3 1  shows  another  preliminary  design  problem,  which  is  a  proposed 
three  story  building.  The  primary  design  requirement  is  to  design  a  building  whose 
lowest,  fundamental  natural  frequency  is  large.  This  structural  dynamics  design 
problem  requires  a  designer  to  balance  a  design's  stiffness  and  mass  characteristics  to 
achieve  the  desired  artifact.  What  makes  this  problem  challenging  is  that  oftentimes 
altering  a  structure's  stiffness  drastically  modifies  its  mass  such  that  the  expected 
natural  frequency  becomes  smaller  rather  than  larger. 

The  fundamental  equation  used  to  describe  a  structural  system's  undamped 
free  vibrations  is  the  matrix  form  of  the  frequency  equation 

|  k-<y2m  |  =  0 

Expanding  the  determinant  yields  an  algebraic  equation  of  the  N*  degree  in  the 
frequency  parameter  co2  for  a  system  having  N  degrees  of  freedom.  For  the  three  story 
building  frame  shown  in  Figure  31,  the  mass  of  this  frame  is  lumped  in  the  girders  for 
each  story,  and  the  columns  are  assumed  to  be  weightless.  These  assumptions  are  an 
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abstraction  of  the  actual  physical  situation  since  we  are  ignoring  the  details  of  the 
columns'  axial  stiffness  and  weight.  The  d,  terms  for  each  story  signify  the  mass  per 
unit  length  of  each  floor  system.  The  width  of  the  frame  is  w  while  the  height  of  each 
story  is  denoted  by  the  L,  terms.  The  following  equation  defines  lumped  masses  for 
each  story: 

mi  =dt  -w 
The  assumption  that  the  girders  are 
rigid  lets  the  columns  between  each 
story  act  as  simple  lateral  spring, 
producing  a  system  with  three 
degrees  of  freedom.  The  sidesway 
stiffness  of  each  story  is 
proportional  to  the  columns' 
material  properties,  cross  section 
properties,  and  length  as  shown  in 
the  following  approximation: 
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Figure  3 1 :  Structure  3 


■    a 

For  this  problem,  the  dimensions  of  the  frame  are  the  primary  design  variables, 
and  for  simplicity,  a  direct  or  indirect  qualitative  relationship  between  the  sidesway 
stiffness  and  story  mass  is  neglected.  Evaluating  the  sidesway  stiffness  of  each  story 
yields  the  following  stiffness  matrix  k: 
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k  = 


The  resulting  mass  matrix  for  each  story  is 


m 


Expanding  the  frequency  equation  gives 


K 

-K 

0 

-*, 

kt+k2 

-K 

0 

"*a 

k2  +*3 

", 

0 

0 

0 

m2 

0 

0 

0 

m3 

(A,  -  (o'mx ) •  [k]  +  k2  -  co2m2  ){k2  +  k2  -  co2m3 ) 
k\  •(&,  -co'mj-  k2   [k2  +k3  -co2nt3)  =  0 


There  are  three  valid  solutions  to  this  equation,  some  of  which  could  be  zero 
or  complex;  however,  due  to  the  nature  of  any  stable  structural  system,  this  will  not 
occur.  Of  the  three  valid  solutions,  the  design  requirements  constrain  the  solution  to 
the  smallest  frequency,  but  this  qualitative  system  does  not  make  distinctions  between 
small  and  smaller.  The  question  of  the  necessity  and  validity  for  a  qualitative  root 
finding  methodology  arises.  For  an  answer,  we  must  focus  on  the  purpose  for  the 
qualitative  analysis.  The  properties  of  structural  systems  indicate  that  even  though  the 
features  and  variables  of  this  type  of  design  problem  effect  the  outcome  of  all  three 
possible  frequencies,  the  result  will  generally  be  consistent  among  all  three  frequencies. 
In  other  words,  this  qualitative  analysis  is  generally  valid  for  all  three  frequencies. 
Thus,  increasing  any  stiffness  terms  without  changing  any  masses,  will  generally 
increase  the  lowest  frequency,  and  decreasing  any  mass  without  changing  any  stiffness 
also  increases  the  lowest  frequency. 
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Once  again  the 
frequency  equation  must 
be  decomposed  into 
equations  with  three 
variables  in  the  same 
manner  as  was  done  in  the 
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previous  example.  This  figure  32:  Annealing  Schedule  for  Structure  3 

results  in  35  representational  feature  neurons  and  312  knowledge  atom  neurons  and 
1872  connections.  The  annealing  schedule  is  shown  in  Figure  32.  Since  there  are  more 
neurons  in  this  example  network,  the  annealing  schedule  was  extended  from  300  to 
400  cycles  to  ensure  that  all  neurons  had  sufficient  time  to  update.  For  this  problem, 
this  annealing  schedule  seemed  to  work  better  than  the  annealing  schedule  used  in  the 
previous  two  examples. 

The  problem  presented  to  this  network  was  a  completion  problem  that  was  to 
determine  the  required  direction  of  change  for  the  column  cross  section  properties 
given  the  following  conditions: 

•  The  natural  frequency  should  increase. 

•  The  modulus  of  elasticity  will  not  change. 

•  The  width  of  the  frame  will  not  change. 

•  The  height  of  each  story  will  not  change. 

•  The  mass  of  each  story  will  not  change. 

This  problem  was  presented  to  the  network  three  times.  Figure  33  shows  the  system 

harmony  values  for  the  first  run.  The  maximum  valid  system  harmony  value  is  6.0,  and 
the  results  for  the  three  runs  are  shown  in  Table  3 
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This  problem 
demonstrates  some 
limitations  of  the  chosen 
qualitative  space  (+,  -,  0). 
In  order  to  ensure  the 
natural  frequency 

increases,  at  least  one  of  Figure  33:  System  Harmony  Values 

the  three  column  cross  sectional  properties  should  increase  as  long  as  the  other 
sections  remain  unchanged;  however,  in  all  three  test  runs,  at  least  one  of  the  cross 
sections  decreased.  Depending  on  the  relative  magnitudes  of  the  increases  and 
decreases,  each  of  these  result  sets  could  be  valid.  The  simple  qualitative  state  space 
chosen  does  not  represent  this  type  of  reasoning,  and  as  a  result  this  qualitative 
analysis  is  of  marginal  use.  Using  a  more  elaborate  qualitative  space  might  remove 
some  ambiguities,  but  it  would  also  have  some  drawbacks.  First,  it  would  drastically 
increase  the  complexity  of  the  neural  network.  Second,  it  could  lead  to  more 
ambiguities  since  more  information  would  be  required  about  relative  magnitudes  of 
variables,  and  this  information  might  not  be  available. 

Table  3 :  Structure  3  Results 
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Summary 
This  chapter  began  by  illustrating  how  an  artificial  neural  network  could  solve 
constraint  satisfaction  problems.  It  was  proposed  that  some  preliminary  design 
problems  are  in  effect  constraint  satisfaction  problems  and  that  an  artificial  neural 
network  that  solves  constraint  satisfaction  problems  could  be  applied.  Harmony  theory 
networks  were  then  introduced  as  a  type  of  artificial  neural  network  that  can  solve 
some  constraint  satisfaction  problems.  Before  demonstrating  Harmony  theory 
networks,  a  simple  qualitative  analysis  process  was  described  that  is  in  effect  a 
constraint  satisfaction  process.  Using  Harmony  theory  networks  to  perform  qualitative 
analyses  of  preliminary  design  problems  were  then  demonstrated  using  three 
preliminary  design  examples. 

Abstractions  used  in  the  design  examples  and  relationships  between  abstract 
design  entities  helped  define  the  qualitative  relationships  encoded  into  the  Harmony 
theory  networks.  When  solving  these  qualitative  analysis  problems,  Harmony  theory 
networks  would  produce  answers  that  were  always  consistent  with  respect  to  the  input 
requirements.  When  given  incomplete  problem  descriptions,  the  Harmony  theory 
networks  "filled-in"  consistent  information  where  possible,  but  with  ambiguous 
problem  descriptions,  Harmony  theory  networks  do  not  have  enough  information  to 
form  complete  problem  solutions. 

The  Harmony  theory  network  simulator  used  in  this  research  did  not  have  any 
automated  learning  process,  and  as  a  result  all  network  connections  were  required  to 
be  hand  coded.  This  limited  the  size  and  scope  of  the  problems  that  could  be 
investigated.  The  qualitative  system  chosen  required  relationships  of  three  variables. 
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Relationships  with  more  variables  required  decomposition  resulting  in  addition 
variables  representing  intermediate  results.  As  the  number  of  relationships  describing  a 
preliminary  design  problem  grew,  the  size  of  the  Harmony  theory  network  grew  at 
what  appears  to  be  a  polynomial  rate. 

Harmony  theory  networks  were  successfully  used  as  a  qualitative  analysis 
system  for  small  preliminary  design  problems.  The  networks  worked  with  numerous 
constraints,  and  they  exhibited  robustness  in  the  sense  that  they  developed  consistent 
results  given  conflicting  or  incomplete  requirements.  As  an  automated  design  tool, 
Harmony  theory  networks  are  limited  in  the  size  of  the  problems  they  can  solve.  The 
qualitative  analysis  system  developed  here  also  limits  the  size  of  problems  to  those 
with  a  dozen  or  so  variables.  Design  problems  appear  to  warrant  a  richer  qualitative 
state  space  than  the  simple  state  space  embodying  only  increasing,  decreasing,  and 
unchanging  relationships. 


BACKPROPAGATION  NETWORKS 

Backpropagation  networks  refer  to  those  neural  networks  that  use  the 
backpropagation  of  error  to  train  multiple  layer  networks  [Rumelhart86c].  This 
chapter  examines  backpropagation  training  of  multilayer  networks  and  variations. 
Backpropagation  type  networks  are  feedforward  networks  that  map  given  input  to 
some  output.  They  perform  best  on  classification,  pattern  recognition,  and 
generalization  tasks.  In  all  the  backpropagation  networks  considered  here,  the  neurons 
are  fully  connected  between  layers,  that  is,  each  neuron  is  connected  to  every  other 
neuron  in  the  layer  above  and  below  it.  Neurons  are  not  connected  within  a  layer; 
there  are  no  feedback  loops.  There  is  no  consideration  for  pruning  or  otherwise 
optimizing  the  network  topology.  The  size  of  the  network  is  fixed  at  the  time  of 
training. 

The  performance  characteristics  of  the  networks  that  are  implemented  as  part 
of  this  research  are  discussed  in  a  later  chapter.  Here,  we  describe  and  summarize 
mathematical  basis  for  the  learning  dynamics  of  this  type  of  neural  network. 
Generalization  and  topology  issues  are  not  directly  dealt  with  here  since  they  are  more 
implementation  dependent. 
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Background 
Networks  without  hidden  layers  perform  well  on  tasks  that  map  similar  input 
patterns  to  similar  output  patterns.  These  networks  can  generalize  and  perform  well  on 
new  patterns  since  output  patterns  typically  are  represented  close  to  the  given  input. 
This  constraint,  however,  limits  these  networks  to  simple,  similar  mappings.  In  the 
"real  world"  mappings  occur  between  rather  dissimilar  patterns.  Minsky  and  Papert 
[Minsky69]  showed  that  without  hidden  units  a  two  layer  network  cannot  solve  one  of 
the  simplest  of  mappings,  the  exclusive-or  (XOR) 
problem.  The  XOR  problem  is  a  mapping  where  patterns 
that  are  opposite  must  produce  identical  output.  Table  4 
shows  the  desired  XOR  mappings. 

Minsky  and  Papert  showed  that  hidden  units  can  augment  the  input  pattern 
such  that  the  network  could  properly  map  the  XOR  problem.  The  difficulty  lies  with 
training  such  artificial  neural  networks.  The  delta  learning  rule  used  in  two  layer 
networks  was  proposed  by  Widrow  and  Hoff  [Widrow60]  and  had  been  used  for 
almost  a  decade  when  Minsky  carefully  detailed  inherent  weaknesses  in  two  layer 
networks.  The  original  delta  learning  rule  does  not  apply  for  networks  with  hidden 
layers.  It  was  not  until  Rumelhart  et  al.[Rumelhart86c]  developed  a  general  learning 
procedure  for  training  networks  with  hidden  layers  that  these  types  of  networks 
became  useful.  It  is  important  to  note  that  several  other  researchers  independently 
developed  similar  learning  procedures  [Parker85,  le  Cun85,  Werbos74].  The 
backpropagation  learning  algorithm  is  a  generalization  of  the  delta  learning  rule 


Table  4 

Input 

Output 

00 

0 

01 

1 

10 

1 

11 

0 
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presented  by  Widrow  and  Hoff  [Widrow60]  and  has  a  strong  analogy  to  the  chain  rule 
for  partial  derivatives.  The  presentation  of  the  process  for  backpropagation  of  error 
given  here  generally  follows  that  of  Rumelhart  et  al.  [Rumelhart86c]. 


Backpropagation  of  Error 
This  section  describes  how  backpropagation  of  error  is  used  to  adjust  the 
connection  weights  between  neurons  in  a  network  with  a  hidden  layer  of  neurons  in 
order  to  produce  a  desired  output.  This  is  a  two  stage  process  where  external  input  is 
feed  input  the  network  at  an  input  layer  to  produce  output  at  an  output  layer,  and  this 
output  is  then  compared  to  desired  output  with  the  resulting  difference  used  to  adjust 
connection  weights.  A  brief  description  of  these  two  stages  informally  introduces  the 
process,  and  the  following  three  sections  illustrate  the  mathematical  details  and  present 
some  common  modifications  to  the  process. 
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Figure  34:  Part  of  a  Network  With  One  Hidden  Layer 
Figure  34  shows  part  of  a  network  with  one  hidden  layer.  During  the  first  stage 

of  processing  (the  forward  pass),  the  input  layer,  i,  receives  external  input,  x„  and  the 

network  propagates  this  input  to  the  hidden  layer,  j,  by  producing  output  activations, 

of.  The  hidden  layer,  j,  processes  the  output  from  layer  /,  producing  output  activations, 

Oj,  and  propagates  o;  to  the  output  layer  k.  Layer  k  processes  o,  and  produces  output 
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activations,  ok,  from  the  network.  This  completes  the  forward  pass.  The  backward 
pass  compares  the  calculated  output  activations,  ok,  to  the  desired  output  activations, 
4„  producing  an  error  signal,  o\,  that  is  used  to  modify  the  weights,  v%  connecting 
neurons  between  layers  k  and/  The  error  signal,  o\,  is  propagated  back  to  layer y  and 
is  used  to  modify  the  weights,  wJh  connecting  neurons  between  layersy  and  i.  This 
forward/backward  sequence  it  repeated  until  the  error  signal  is  sufficiently  reduced. 
The  backpropagation  of  error  algorithm  minimizes  the  error  measure  between  the 
calculated  output  activations  and  the  target  activations  of  the  output  layer.  This  is 
called  supervised  learning  since  the  desired  output  patterns  are  defined  prior  to 
execution  of  the  algorithm. 

The  backpropagation  of  error  process  is  presented  in  three  parts.  The  first  part 
details  how  the  network  calculates  the  output  during  the  forward  pass.  The  second 
section  derives  the  procedure  for  modifying  the  weights  during  the  backward  pass. 
The  third  section  presents  some  common  modifications  to  the  basic  backpropagation 
algorithm  that  have  become  de  facto  additions.  Following  these  sections,  this  chapter 
presents  some  variations  to  the  backpropagation  algorithm  that  can  help  in  overcoming 
some  weaknesses  inherent  in  the  algorithm. 

Forward  Pass— Calculating  the  Output 

Calculating  the  output  for  a  backpropagation  network  follows  the  general 

procedure  presented  in  the  third  chapter.  The  input,  x„  to  a  layer  of  neuronsy,  is 

multiplied  by  the  weight  matrix,  wJh  connecting  the  neurons  of  layer  /  to  layer  j, 

producing  values  u}  as  follows: 
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These  values,  uh  are  then  operated  on  by  a  nonlinear  activation  function,  f(),  producing 
activation  outputs,  Oj. 

o,=f{u)  (2) 

The  sigmoid  activation  function  is  usually  used  in  most  implementations  of 
backpropagation  learning.  Continuing  the  description  of  the  forward  pass  process,  the 
input  to  layer  k  is  the  output  of  layer/  The  weights,  wkj,  connecting  layers  k  andj  are 
multiplied  by  o,,  producing  the  values  uk  as  shown  in  the  following  equation: 

ui  =  Z*%°,  (3) 

The  Uk  values  are  operated  on  by  a  nonlinear  activation  function,  fO,  producing 
activation  outputs,  ok. 

ok=f{uk)  (4) 

This  completes  the  forward  pass. 

Backward  Pass-Adjusting  the  Weights 

This  phase  can  be  viewed  as  a  classical  unconstrained  nonlinear  optimization 
problem.  The  solution  involves  modifying  a  the  weights  in  such  a  way  as  to  minimize 
the  error  between  the  calculated  neuron  output  activations  and  the  given  target  output 
activations.  The  error  between  the  calculated  output  activations,  ok,  and  the  target 
output  activations,  tk,  is  defined  as  follows: 

£=f£v"-°*)2  (5) 
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To  minimize  E,  its  gradient  must  vanish 

vs^.JUo  (6) 

In  general  it  is  very  hard  to  solve  this  problem  directly  due  to  the  highly  nonlinear 
nature  of  the  error.  An  iterative,  numerical  search  can  be  made  to  minimize  E  using 
gradient  information.  If  the  gradient  VE{wk])  can  be  calculated,  then  we  can  search  in 
the  negative  gradient  direction  to  locate  a  minimum;  this  processes  is  known  as 
gradient  descent.  For  a  given  set  of  weights,  w*,,  we  can  use  the  negative  error 
gradient  to  modify  the  initial  set  of  weights  in  an  iterative  manner  as 

cE 

W»=W*-'1^:  (7) 

where,  r\  is  a  scalar  parameter.  So  we  wish  to  make  incremental  changes  to  the 
weights,  Awig,  that  is  proportional  to  -dE/dw^ 

*r,  =  -of-  (8) 

The  error,  E,  is  expressed  in  terms  of  the  output  activations  Ok  shown  in  equation  (4), 
where  Uk  is  calculated  from  equation  (3).  The  partial  derivative  of  the  error  with 
respect  to  the  weights  can  be  evaluated  used  the  chain  rule 

From  equation  (3),  we  can  evaluate  the  second  term  in  equation  (9)  to  be 

dik         d 


(9) 


a*H    dv^ 


E*v>,=°j  (io) 
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The  first  term  in  equation  (9)  remains  to  be  evaluated.  Equation  (4)  provides  a 
relationship  between  uk  and  ok,  and  equation  (5)  relates  E  to  ok.  Using  the  chain  rule 
the  following  relationship  is  shown: 

cE       3L  cbk 


&k    a>k  &k 

From  equation  (5) 

cE        < 

From  equation  (4) 

*'    f(„  1 

(11) 


(12) 


(13) 

aik 

Therefore,  equation  (8)  can  be  written  as 

^k]=ritk-ok)r(uk)o]  (14) 

Equation  (14)  only  specifies  the  change  of  to  the  weights  that  connected  to  the 
output  layer.  The  change  of  weights  required  for  the  weights  prior  to  those  connected 
to  the  output  layer  are  from  gradient  descent 

A" '>-«§;,  <15) 

Using  the  chain  rule  in  a  similar  manner  as  equation  (9),  the  error  gradient  can  be 
evaluated  as 

cE       cE  &J 


dvfl      aii  dvJt 


(16) 


Using  equation  (1),  the  second  term  in  equation  (16)  is  evaluated  as 
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The  first  term  in  equation  (16)  is  evaluated  using  the  chain  rule  in  a  similar  manner  as 
equation  (11) 


clij      cbJ  ckis 


(18) 


The  second  term  in  equation  (18)  is  evaluated  by  using  equation  (2) 


The  first  term  in  equation  (18)  is  evaluated  using  the  previous  error  signal  that  is 
defined  as 

8>-—aTk  (20) 

Using  the  chain  rule  and  equation  (3),  we  can  evaluated  the  first  term  in  equation  (18) 
as 

Using  equations  (17),  (19),  and  (21),  equation  (15)  can  be  written  as 

AWj^E^/'M*.  (22) 

k 

Equation  (14)  defines  the  modification  to  weights  that  are  directly  connected 
to  the  output  layer,  and  equation  (22)  defines  the  modification  required  for  all  other 
weights.  If  multiple  input-target  pairs  are  presented  to  the  network,  the  weight 


1  Note  that  if  layer  /  was  not  the  input  layer,  the  output  of  layer  /  would  be  used  (i.  e. 
oh  and  the  procedure  would  continue. 
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modifications  defined  by  equations  (14)  and  (22)  are  accumulated  using  the  following 

equations: 

AH%=lA,w„  (23) 

p 

A*,=lAA,  (24) 

p 

where,  there  are  p  unique  input-target  pairs  presented  to  the  network.  Once  all  p 

training  pairs  have  been  processed,  the  weight  changes  are  added  to  the  current 

weights  (equation  7)  and  the  iteration  continues.  Fahlman  [Fahlman89]  calls  this 

method  of  updating  the  weights  periodic  updating.  One  presentation  of  all  training 

pairs  is  called  an  epoch.  Provide  the  weight  changes  (Aw)  are  small,  it  is  not  necessary 

to  accumulate  the  weight  changes  before  updating  the  weight  values.  The  weights  can 

be  updated  once  the  weight  changes  have  been  calculated  for  each  training  pair.  This 

method  of  updating  the  weights  is  called  continuous  updating  [Fahlman89]. 

Learning  Rate.  Weight  Initialization,  and  Momentum 

For  backpropagation,  there  are  two  arbitrary  coefficients  that  must  be  set 

before  commencing  training.  These  are  the  scalar  parameter,  r\,  which  is  called  the 
learning  rate  [Rumelhart86c],  and  the  range  in  which  weights  are  initialized.  The  value 
chosen  for  the  learning  rate  is  important  to  the  speed  of  learning.  The  change  in  weight 
must  be  proportional  to  the  derivative  of  the  error  measure.  Gradient  descent  requires 
that  small  steps  be  taken;  therefore,  the  learning  rate  term,  tj,  is  the  constant  of 
proportionality  or  step  size.  The  larger  the  learning  rate,  the  larger  the  changes  in  the 
weights.  Learning  rates  greater  than  1.0  can  easily  lead  to  oscillations,  but  very  small 
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learning  rates  causes  learning  to  become  impossibly  slow.  In  the  backpropagation 
process,  a  single  learning  rate  is  applied  to  all  weight  changes.  Therefore,  conservative 
choices  for  the  learning  rate  should  be  in  the  following  range: 

0  <  77  <  1.0 

The  learning  process  modifies  a  set  of  initial  weights  such  that  the  output  layer 
error  is  minimized.  An  initial  set  of  weights  must  be  provided  as  a  starting  point  for 
this  minimization.  The  initial  set  is  usually  made  up  of  random  values  that  are 
initialized  between  an  upper  and  lower  bound.  If  the  starting  weights  are  near  an  error 
minimum,  then  the  number  of  iterations  required  to  minimize  the  error  will  generally 
be  relatively  small.  If  weight  magnitudes  are  very  large  or  very  small,  then  because  the 
error  is  propagated  back  through  the  weights  in  proportion  to  the  values  of  the 
weights  (equation  22)  round-off  and  overflow  errors  are  possible. 

If  all  weights  start  with  equal  values  and  if  the  solution  requires  unequal 
weights,  no  learning  will  occur  since  the  error  propagated  back  through  the  network  is 
proportional  to  the  magnitudes  of  the  weights  (equation  22).  All  hidden  neurons 
directly  connected  to  the  output  neurons  will  get  the  same  error  signal  and  the  weights 
of  connecting  those  units  will  always  be  the  same.  To  counteract  this  problem,  the 
weights  are  initialized  with  random  values.  Based  on  experience,  a  conservative  range 
for  weight  initialization  is 

-  1.0  <  w:j  <  1.0 

Larger  and  smaller  ranges  can  be  used. 

Rumelhart  et  al.  [Rumelhart86c]  found  that  by  slightly  modifying  the  basic 
algorithm  with  a  momentum  term,  they  could  increase  the  rate  of  learning  without 
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leading  to  oscillations.  Equation  (25)  shows  the  inclusion  of  the  momentum  term,  a,  in 
equation  (8).  The  superscript  t  indicates  the  current  epoch  number. 

cEl 
A<=-7— +aA<  (25) 

The  momentum  term  is  a  constant  that  determines  the  effect  of  past  weight  changes  to 
the  current  weight  change.  The  momentum  term  implements  a  form  of  smoothing  that 
filters  out  oscillations  in  steep  regions  of  weight  space  caused  by  the  alternating  sign  of 
the  gradient.  Momentum  does  this  by  focusing  the  movement  in  a  downhill  direction 
by  allowing  the  previous  movement  direction  to  affect  the  current  direction. 
Momentum  magnifies  the  learning  rate  for  flat  regions  of  weight  space  where  the 
gradient  is  close  to  constant.  The  typical  range  of  values  for  a  is 

0  <  a  <  1.0 
The  literature  has  not  reached  a  consensus  on  a  good  value  for  the  momentum 
term.  Fahlman  [Fahlman89]  believes  that  small  values  for  momentum  (  a  <  0.5) 
improve  convergence  while  others  have  successfully  used  higher  values  for  momentum 
[Rumelhart86c,  le  Cun90].  Since  little  work  has  been  done  on  collecting  a  standard  set 
of  testing  problems  for  backpropagation,  one  can  conclude  very  little  from  these 
claims  since  each  paper  uses  different  test  problems  and  network  parameters. 
Tollenaere  [Tollenaere90]  has  done  perhaps  the  most  detailed  test  of  the  effect  of 
momentum.  This  chapter  discusses  his  work  in  the  section  on  learning  rate  adaptation; 
however,  his  findings  on  the  influence  of  momentum  are  summarized  as  follows: 

•  Using  momentum  speeds  up  learning,  typically  by  a  factor  of  two  to  three. 

•  Using  a  high  momentum  value  does  not  result  in  instability,  provided  a  small 
learning  rate  is  used. 


143 

•     Using  a  small  momentum  value  results  in  a  wider  distribution  of  learning  times. 

To  help  identify  two  limitations  of  momentum,  we  can  rewrite  equation  (25)  as 
the  following  exponentially  weighted  sum: 

cE'-k 


A<  =~nLa 


The  first  limitation  of  momentum  becomes  evident  when  considering  what 
happens  if  the  current  error  derivative  changes  sign  with  respect  to  the  previous  time 
step.  If  the  current  change  is  small  enough,  then  the  momentum  term  can  cause  the 
weight  to  be  adjusted  up  the  slope  of  the  error  surface,  instead  of  down  the  slope.  The 
second  limitation  is  that  there  exists  an  upper  bound  on  the  amount  of  change 
momentum  can  make  on  a  weight.  Consider  if  all  the  error  derivatives  over  time  are 
equal  to  one,  then  the  exponentially  weighted  sum  of  the  current  and  past  derivatives 
converges  to  1/(1  -  a)  and  the  most  a  weight  can  be  modified  is  by 

%\-a) 
The  effective  maximum  step  size  depends  on  both  the  values  of  momentum  and 
learning  rate  and  can  possibly  be  quite  small. 

There  are  several  modifications  that  can  be  made  to  backpropagation  learning 
that  make  the  process  faster.  These  modifications  are  discussed  in  the  next  sections. 

Variations 
The  greatest  obstacle  to  the  widespread  use  of  artificial  neural  networks  in 
real-world  applications  is  the  slow  speed  at  which  most  networks  learn.  The  power 
and  generality  of  backpropagation  make  it  an  appealing  network  paradigm.  In  difficult 
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applications  where  there  are  highly  nonlinear  relationships  between  inputs  and  outputs, 
backpropagation  networks  have  displayed  accurate,  reliable  results;  however, 
backpropagation  suffers  from  slow  training  times  and  scales  poorly  as  the  number  of 
input  and  output  neurons  increase.  Researchers  in  numerical  optimization  will  quickly 
point  out  that  gradient  descent,  its  fundamental  operating  principal,  is  not  the  most 
efficient  unconstrained  optimization  algorithm,  albeit  it  is  one  of  the  oldest  known 
methods.  If  network  users  needed  to  train  their  networks  a  single  time,  then  long 
training  times  might  be  tolerable  for  some,  but  iterative  optimization  of  a  network's 
input/output  parameters,  when  developing  features  from  the  data  set,  often  require 
several  repetitions  of  training.  For  practical  applications  this  means  completely 
retraining  the  network.  When  training  takes  days  and  even  weeks  without  a  guarantee 
of  success,  this  can  impose  intolerably  long  waiting  periods  during  development.  To 
overcome  this  basic  limitation,  a  great  deal  of  research  has  been  done  to  speed  up 
learning. 

This  section  discusses  several  methods  for  making  the  learning  dynamics  more 
efficient  in  backpropagation  type  networks.  Several  of  the  methods  make  use  of 
heuristics  that  have  some  basis  in  second-order  numerical  processes.  Others  are  slight 
modifications  that  can  be  easily  incorporated  in  an  algorithm  without  changing  the 
basic  premise. 

Flat  Spots 

Fahlman  [Fahlman89]  noted  that  during  some  of  his  experiments,  some  units 
would  get  stuck  either  at  high  activation  values  or  at  zero.  This  is  due  to  what  he 
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terms  "flat  spots"  where  the  derivative  of  the  sigmoid  function  approaches  zero.  The 
sigmoid  activation  function  and  its  derivative  are  given  as 

l  +  e 
As  we  propagate  the  error  backwards,  the  error  seen  by  each  neuron  is  multiplied  by 
the  derivative  of  the  sigmoid  function.  This  derivative  goes  to  zero  whenever  the 
sigmoid  of  the  output  goes  to  zero  or  one.  For  those  neurons  whose  sigmoid  goes  to 
zero  or  one,  even  if  the  error  is  large  for  that  neuron,  only  a  trivial  portion  of  this  error 
will  be  passed  to  the  incoming  weights  and  neurons  in  earlier  layers.  This  phenomenon 
increases  learning  time  and  can  cause  the  network  to  enter  paralysis  if  round-off  errors 
become  too  significant. 

Fahlman  suggests  and  tested  a  very  simple,  effective  means  for  eliminating 
these  flat  spots.  He  modified  the  derivative  of  the  sigmoid  function  such  that  it  never 
goes  to  zero  for  any  output  value  by  adding  a  constant  0. 1  to  the  value  of  the 
derivative  before  it  is  used  to  scale  the  backpropagation  error.  In  some  of  his  tests, 
learning  time  was  cut  by  half.  This  modification  is  very  simple  and  applicable  to 
standard  backpropagation  and  its  modifications. 

Symmetric  Activation  Function 

In  another  attempt  to  reduce  the  affect  of  flat  spots,  Stornetta  [Stornetta87] 
did  a  study  to  examine  the  effects  of  altering  the  dynamic  range  of  the  sigmoid 
function.  He  found  that  by  using  a  range  symmetric  about  zero  (ranging  from  -1/2  to 
1/2)  rather  than  from  0  to  1,  improvements  in  learning  times  would  result. 
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Offsetting  the  training  pattern  pairs  to  range  from  -1/2  to  1/2  instead  of  from  0 
to  1  and  offsetting  the  sigmoid  to  range  from  -1/2  to  1/2,  Stornetta  found  that  neurons 
would  train  faster.  His  change  to  the  sigmoid  function  is 

1 


*  =  /(«)  =  - 2+/(i  +  e-") 


He  tested  this  modification  using  several  different  network  architectures  and  binary 
problems  that  ranged  from  the  2-2-1  XOR  problem  to  problems  that  used  a  203-80-26 
topology  and  found  that  improvements  in  learning  speed  ranged  from  10%  to  50% 
faster. 

Depending  on  the  problem,  either  Fahlman's  or  Stornetta' s  modification  might 
work  better.  At  this  time,  they  both  appear  to  function  well  without  adversely  effecting 
backpropagation's  performance. 

Hyperbolic  Arctangent  Error  Function 

In  standard  backpropagation  the  error  is  simply  the  difference  between  the 
calculated  output  value  and  the  target  value  for  a  particular  pattern.  This  difference  is 
then  propagated  back  through  the  network.  Fahlman  [Fahlman89]  noted  that  one  way 
of  accelerating  learning  was  to  allow  the  error  to  grow  disproportionately  large  for 
those  neurons  that  are  very  far  from  their  target. 

Replacing  the  difference  computed  at  each  output  unit  by  the  hyperbolic 
Arctangent  function  of  that  difference  effectively  scales  the  error.  When  the  difference 
is  small,  the  result  is  almost  linear.  As  the  difference  approaches  1 .0,  the  error  goes  to 
+oo,  and  as  it  approaches  -1.0,  the  error  goes  to  -oo.  Since  ±qo  is  difficult  to  deal  with 
in  numerical  programs,  Fahlman  recommends  that  for  any  difference  greater  than 
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+0.9999999,  the  hyperbolic  arctangent  function  be  modified  to  return  +17.0,  and  for 
differences  less  then  -0.9999999,  it  return  -17.0.  Fahlman  says  that  using  this 
modification  resulted  in  lower  learning  times  in  some  simulations  by  25%. 

Newton's  Method 

Newton's  method  is  an  unconstrained  optimization  technique  for  functions  of 

more  then  one  variable.  The  backpropagation  algorithm  is  based  on  steepest  descent, 

perhaps  the  least  sophisticated  unconstrained  optimization  process.  Newton's  method 

is  a  classical  second-order  method  and  provides  a  basis  for  understanding  several  of 

the  higher  order  learning  methods  and  heuristics  that  follow. 

The  object  of  Newton's  method  is  to  find  a  minimum  of  a  function  of  more 

then  one  variable.  We  are  interested  in  minimizing  the  error  as  a  function  of  the 

weights,  E(wj,).  Starting  with  the  second-order  Taylor  series  expansion  of  E(wj,)  about 

the  current  weight  space  point  at  epoch,  /,  we  have  the  following: 

e(w<,  +  to>, )  -  e(w', )  +  VE(w'jt )Awjt  +  ± &wp  V2 e(w'ji )aWji 

Solving  for  the  weight  increment, 

to,  =-[v2£(w;,)]"'v£(v,;) 

As  can  be  seen  for  Newton's  method,  we  must  provide  error  function  values,  gradient 
information,  and  the  Hessian  matrix  of  second  derivatives.  Newton's  method  will  find 
the  optimum  weights  in  one  step. 

There  are  several  problems  with  Newton's  method  that  make  it  impractical. 
One  is  that  the  Hessian  matrix  may  be  singular  or  not  positive  definite  as  required  to 
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guarantee  a  minimum,  and  another  problem  is  that  the  Hessian  matrix  and  its  inverse 
are  required.  Calculating  the  Hessian  matrix  is  expensive  in  terms  of  required  storage 
and  computation.  There  are  a  number  of  modifications  to  Newton's  method  that 
improve  its  convergence  and  overcome  many  of  its  drawbacks.  These  optimization 
methods  are  called  quasi-Newton  or  variable  metric  methods.  Watrous  [Watrous87] 
implemented  the  Broyden-Fletcher-Goldfarb-Shanno  (BFGS)  and  Davidon-Fletcher- 
Powell  (DFP)  algorithms,  two  quasi-Newton  methods  [Press88].  For  n  total  weights 
in  a  network,  the  quasi-Newton  methods  require  0(n2)  operations  for  each  iteration 
compared  to  O(n)  operations  for  backpropagation.  The  computational  advantage  of 
quasi-Newton  methods  is  only  achieved  for  small  to  moderate  problems  due  to  their 
computational  complexity. 

The  backpropagation  algorithm  changes  a  weight  based  only  on  local 
information  available  to  the  weight.  Quasi-Newton  methods  do  not  have  this  property 
of  locally  computable  weight  update  terms,  which  has  been  a  traditional  constraint  on 
computations  in  artificial  neural  networks.  Artificial  neural  networks  have  long  been 
used  to  simulating  biological  neural  networks,  whose  neurons  perform  only  local 
computations.  Jacobs  [Jacobs88]  believes  that  the  local  computation  constraint  will 
facilitate  implementation  on  parallel  computing  architectures.  The  methods  that  follow 
adhere  to  this  constraint  in  the  spirit  of  artificial  neural  networks. 

Quickprop 

In  steepest  descent,  the  set  of  partial  first  derivatives  collected  at  a  single  point 
yields  very  little  information  about  the  step  size  we  could  take  in  weight  space  when 
modifying  the  weights.  Using  second  derivative  information,  the  curvature  of  the  error 
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function,  would  provide  us  with  a  better  estimate  of  the  step  size.  This  is  what 
Newton's  method  uses.  The  second  derivative  requires  a  very  costly  set  of  calculations 
that  violate  the  local  computation  constraint.  Quickprop  [Fahlman89]  is  a  modification 
to  standard  backpropagation  learning  that  is  heuristic  in  nature  and  is  founded  on 
Newton's  method.  Quickprop  proceeds  as  does  backpropagation,  and  a  copy  of  the 
error  gradient,  VE(wj,)'~',  and  the  change  in  weights,  Aw/" ,  is  kept  from  the  previous 
iteration.  In  addition,  the  current  error  gradient,  VE(Wj,)',  is  available. 

Quickprop  is  based  on  two  important  and  somewhat  questionable  assumptions: 

1 .  The  error  surface  in  weight  space  is  concave  upward  and  parabolic. 

2.  The  error  gradient,  as  seen  by  each  weight,  is  independent  of  changes  in  the 
other  weights. 

The  weight  change  formula  adheres  to  the  local  computation  constraint  and  follows: 

cEW 
*>*«/&/      A<'  (26) 

This  formula  is  derived  from  a  parabola  formed  using  the  previous  error  gradient  and 
current  error  gradient  and  calculates  an  approximation  of  the  minimum  point  of  this 
parabola.  This  is  a  crude  approximation  for  the  weight  change  to  the  true  minimum 
since  it  does  not  take  into  account  the  error  gradients  with  respect  to  other  weights, 
but  when  done  iteratively,  the  algorithm  can  converge  towards  a  minimum  in  weight 
space  [Fahlman89]. 

Because  of  Quickprop 's  two  basic  assumptions  about  the  nature  of  the  error 
surface  and  independence  of  weight  changes,  we  must  examine  several  weight  change 
cases  to  ensure  stability.  First,  if  the  current  slope  is  smaller  than  and  in  the  same 
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direction  as  the  previous  slope,  Quickprop  is  heading  in  the  right  direction.  Second,  if 
the  current  slope  is  in  the  opposite  direction  from  the  previous  slope,  Quickprop  has 
passed  over  the  minimum  and  the  next  step  will  continue  to  converge  towards  the 
minimum.  Third,  if  the  current  slope  is  in  the  same  direction,  but  is  larger  than  the 
previous  slope,  then  Quickprop  is  heading  in  the  wrong  direction,  moving  towards  a 
local  maximum.  Fourth,  if  the  current  slope  is  equal  to  the  previous  slope,  then 
Quickprop  will  take  an  infinite  step.  These  last  two  cases  must  be  dealt  with  in  order 
to  avoid  numerical  instability  and  nonconvergence.  Fahlman  suggests  limiting  the  rate 
of  increase  of  the  step  using  a  "maximum  growth  factor,"  u,.  Quickprop  applies  the 
maximum  growth  factor  as  follows: 

if  Aw'}l  >  fjAw'^ ,  then  Aw},  =  juAw'^1 

Since  the  simultaneous  update  of  other  weights  causes  some  "noise"  between  each 
iteration,  the  growth  factor  suppresses  the  influence  of  the  noise.  Based  on  Fahlman' s 
experience,  he  recommends  a  value  of  1.75  for  the  maximum  growth  factor.  Values 
larger  than  this  can  cause  the  network  to  behave  erratically  without  converging,  and 
values  too  small  slow  the  learning  process. 

The  weights  in  Quickprop  are  unbounded  and  for  some  problems,  they  can 
become  so  large  that  they  cause  numerical  overflow  or  drown  out  other  weights. 
Using  a  weight  decay  term,  t,  and  multiplying  it  with  the  slope  computed  for  each 
weight  helps  limit  the  magnitudes  of  weights.  Since  weights  can  quickly  grow  large  in 
Quickprop,  the  weight  decay  term  is  applied  at  each  iteration.  Typical  values  for  the 
weight  decay  term  are  around  the  magnitude  of  0.001. 
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In  regions  with  large  gradients,  Quickprop  may  begin  to  oscillate  if  the  learning 
rate  parameter  is  too  large.  For  this  reason,  most  difficult  learning  tasks  require 
Quickprop 's  learning  rate,  rj,  to  be  set  to  very  small  magnitudes  on  the  order  of  10"3  or 
smaller.  For  the  momentum  parameter,  a,  Quickprop  can  handle  higher  values  close  to 
unity. 

Finally,  Quickprop  requires  a  way  to  bootstrap  the  process  since  the  previous 
values  required  are  not  available  or  when  the  weight  change  from  the  previous 
iteration  is  zero.  The  easiest  way  to  do  this  is  by  always  adding  a  gradient  descent  term 
(e.  g.,  equation  25)  to  the  calculated  weight  change  value  from  equation  (26).  Then  for 
the  first  iteration,  the  gradient  descent  term  is  used  by  itself,  and  for  subsequent 
iterations,  the  gradient  descent  term  is  also  used  in  every  case  except  when  the  current 
slope  is  in  the  opposite  direction  as  the  previous  slope  and  is  smaller.  In  this  case  the 
gradient  descent  term  is  not  applied  to  the  weight  change  since  it  could  cause 
overshoot  and  oscillation. 

In  Fahlman's  tests,  he  found  that  Quickprop  outperformed  standard 
backpropagation  on  a  number  of  N-M-N  encoder  problems.  The  objective  of  the 
encoder  problems  is  to  reproduce  the  input  vectors  at  the  output  where  "M"  is  less 
than  "N".  This  requires  the  network  to  perform  data  compression.  The  problems 
ranged  from  4-2-4  all  the  way  to  256-8-256  in  size.  Figure  35  shows  the  topology  of 
the  4-2-4  encoder  problem.  Fahlman's  findings  indicate  that  Quickprop  is  anywhere 
from  two  to  ten  times  faster  than  standard  backpropagation  depending  on  the  problem 
size.  One  of  the  most  interesting  aspects  of  his  encoder  tests  was  that  as  the  problems 
were  scaled  up,  the  learning  times  grew  at  a  rate  of  log  N ,  where  N  is  the  number  of 
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training  pairs.  This  is  in  contrast  to  the  typical  scaling  problems  of  artificial  neural 
networks  where  as  the  network  increases  in  size,  the  training  times,  measured  in 
epochs,  grow  exponentially. 
Fahlman  also  tested 
Quickprop  with  the  XOR  problem 
and  found  that  it  was  much  faster 
than  standard  backpropagation,  by  an 
order  of  magnitude.  Lang  and 
Witbrock  [Lang89]  tested  Quickprop 
using  multiple  hidden  layers  on  a 
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Figure  35:  4-2-4  Encoder  Problem 


highly  nonlinear  problem  involving  continuous  valued  inputs  and  outputs. 
Backpropagation  type  networks  with  multiple  hidden  layers  tend  to  significantly  dilute 
the  error  signal  as  it  is  passed  up  through  each  successive  layer  and  as  a  result  learning 
slows  down  by  an  order  of  magnitude  for  each  hidden  layer.  However,  learning  in 
highly  nonlinear  problems  can  benefit  from  multiple  hidden  layers.  In  Lang's  study, 
Quickprop  had  the  fastest  learning  times  by  more  than  a  factor  of  two  over 
backpropagation  with  momentum.  Neither  one  of  these  studies  is  conclusive  since 
Quickprop  has  not  been  employed  for  a  real  world  problem;  however,  they 
demonstrate  Quickprop' s  potential  as  a  viable  alternative  to  standard  backpropagation. 

Learning  Rate  Adaptation 

The  learning  rate  term,  r\,  in  standard  backpropagation  defines  the  step  size,  or 
the  amount  that  the  weights  will  change  for  any  given  training  iteration.  Small  step 
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sizes  increase  training  times,  and  large  step  sizes  cause  instability.  In  addition, 
characteristics  of  the  error  surface  vary  in  different  directions  associated  with  each  of 
the  weights;  therefore,  a  single  learning  rate  for  all  weights  is  not  optimal.  Based  on 
these  observations  Jacobs  [Jacobs88]  identified  four  general  heuristics  for  achieving 
faster  learning  times: 

1.  Every  weight  should  have  its  own  step  size. 

2.  Step  sizes  should  be  allowed  to  vary  over  time. 

3.  When  the  derivative  of  a  weight  possesses  the  same  sign  for  consecutive  steps, 
the  step  size  for  that  weight  should  be  increased. 

4.  When  the  derivative  of  a  weight  changes  sign,  the  step  size  should  decrease. 
The  first  heuristic  recognizes  that  a  single  step  size  for  all  weights  is  not 

optimal.  Tollenaere  [Tollenaere90]  shows  that  by  for  each  problem,  there  exists  an 

optimal  step  size  region  such  that  for  all  step  sizes  in  the  region,  the  process  converges 

fast  and  remains  stable.  The  second  heuristic  distinguishes  the  different  properties 

along  different  regions  of  a  single  weight  dimension.  In  order  to  take  appropriate  steps 

as  the  weight  varies  over  its  possible  values,  the  step  size  should  also  change 

accordingly.  The  third  heuristic  take  into  account  the  phenomenon  of  small  curvature 

in  the  region  near  the  current  point.  The  error  surface  along  this  dimension  continues 

to  slope  in  the  same  direction  for  a  significant  distance;  therefore,  by  increasing  the 

step  size,  the  number  of  epochs  required  to  traverse  this  region  will  be  reduced.  The 

fourth  heuristic  recognizes  when  the  step  size  is  too  large  and  a  minimum  value  has 

been  stepped  over.  When  the  sign  of  the  derivative  changes,  the  error  surface  at  the 

current  point  frequently  possesses  high  curvature,  and  the  slope  of  the  error  surface 
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changes  quickly.  To  prevent  the  weight  parameter  from  oscillating,  the  step  size  is 
reduced. 

Providing  each  weight  with  its  own  step  size,  the  current  point  in  weight  space 
is  not  modified  in  the  direction  of  the  negative  gradient  as  is  done  in  gradient  descent 
nor  is  the  curvature  of  the  region  calculated  as  is  done  in  Newton's  method.  Instead, 
the  heuristics  provide  an  estimate  of  the  curvature  in  the  region  of  the  current  point  in 
weight  space  and  the  weight  is  modified  based  on  this  estimate  and  the  derivative  of 
the  error  with  respect  to  the  weight.  Since  these  are  just  heuristics  and  we  are  not 
calculating  the  true  curvature,  we  must  be  aware  of  situations  where  these  heuristics 
may  fail  and  the  consequences  of  failure. 

Consider  an  error  : 

surface  defined  in  two 


dimensional  weight  space 
with  a  steep  valley  that  is  at  a 
45  degree  angle  to  both 
weight  axes  as  shown  in 
Figure  36.  In  this  valley,  the 
surface  has  high  curvature 
along  both  weight 
dimensions.  Because  of  the 
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Figure  36:  Two  Dimensional  Error  Surface 


high  curvature  and  a  large  step-size  the  error  gradient  could  change  sign,  the  heuristics 
would  cause  the  learning  rate  for  each  weight  to  decrease  rather  than  increase  as  it 
should  to  lower  the  error  fastest.  The  failure  of  these  heuristics  can  be  directly 
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attributed  to  the  locality  constraint.  Since  each  weight  update  is  independent  of  all 
others,  the  mutual  benefit  of  considering  nonlocal  weight  changes  can  not  be  assessed. 

Several  authors  have  proposed  strategies  that  follow  Jacobs'  heuristics.  The 
main  difference  in  each  of  these  implementations  is  in  the  strategy  for  updating  the 
learning  rates.  Jacobs  implemented  a  version  he  calls  the  delta-bar-delta  update  rule, 
where  the  learning  rates  decrease  exponentially  and  increase  linearly.  Jacobs  says  that 
his  method  prevents  the  learning  rates  from  becoming  too  large  too  fast.  Jacobs' 
strategy  allows  for  small  increases  in  the  step  size,  but  this  method  can  have  difficulty 
when  problems  need  some  large  step  sizes  since  it  may  take  a  long  time  before  the  step 
size  grows  large  enough  using  a  linear  increase  rule. 

Devos  and  Orban  [Devos88]  also  implemented  these  heuristics  but 
incorporated  Quickprop's  assumption  that  the  weight  space  is  quadratic.  This 
assumption  is  only  roughly  true  in  the  vicinity  of  a  minimum.  If  the  step  size  is  large, 
then  a  change  in  the  weight  derivative's  sign  can  happen  in  a  region  that  is  not  close  to 
the  minimum.  Using  the  assumption  that  the  error  surface  is  quadratic  will  cause  the 
estimate  of  the  minimum  to  occur  at  essentially  a  random  location,  causing  instabilities. 
Quickprop  implements  a  maximum  growth  factor  to  limit  this  effect. 

Both  models  [Devos88,  Jacobs88]  must  also  deal  with  weights  that  may  grow 
infinitely  large.  Tollenaere  developed  a  learning  rule  that  incorporates  Jacobs' 
heuristics  but  does  not  make  the  risky  assumption  that  the  error  surface  is  quadratic. 
Tollenaere  calls  his  learning  rule  SuperSAB.  SuperSAB  is  based  on  Devos'  self- 
adapting  backpropagation  rule  (SAB)  but  is  improved  version  of  SAB.  The  SAB 
algorithm  is  as  follows: 
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1 .  Choose  an  initial  step  size  %  =  r\  for  all  weights. 

2.  Perform  backpropagation  without  using  a  momentum  term. 

3.  If  the  calculated  weight  gradient  has  the  same  sign  as  the  previous  weight 
gradient,  then  increase  that  weight's  %  exponentially  (tv,  =  %x  constant)  and 
repeat  step  (2). 

4.  If  the  weight  gradient  changes  sign  then  reset  r|y  to  its  initial  value  r|  and 
estimate  the  optimum  weight  by  quadratic  interpolation  based  on  the  previous 
gradient  value.  Then  do  several  backpropagation  steps  using  momentum  but 
without  altering  r\v.  After  several  steps,  go  back  to  step  (2). 

5.  Continue  steps  (2),  (3),  and  (4)  until  the  system  converges  or  is  stopped. 
Devos'  SAB  algorithm  does  not  offer  anything  really  different  from  Quickprop.  Both 

methods  assume  a  quadratic  error  surface  so  it  is  likely  that  both  methods  will  have 

difficulty  on  similar  problems.  SuperSAB  is  closer  to  Jacobs'  delta-bar-delta  algorithm 

but  provides  for  long  flat  regions  where  the  step  size  should  quickly  grow  large. 

SuperSAB  requires  three  learning  rate  factors.  These  are  r\+  for  the  step  size  increase 

factor,  tv  for  the  step  size  decrease  factor,  and  TVurt  for  the  initial  step  size  for  all  r\v. 

SuperSAB  proceeds  as  follows: 

1 .  Set  all  x],j  to  the  initial  value  r|sUrt. 

2.  Perform  backpropagation  with  momentum  for  epoch  n. 

3.  For  each  wi}  as  long  as  the  weight  gradient  does  not  change  from  epoch  to 
epoch,  increase  the  learning  rate  by,  %/w+l)  =  r|+  +  r\v(n) 

4.  When  the  weight  derivative  changes  sign,  then 

•  undo  the  previous  weight  update  that  caused  the  change  in  gradient  sign  by 
setting,  Awv"+i  =  -Aw,/,  and  updating  the  weights. 

•  reduce  the  step  size  by,  r|y(«+l)  =  r\.  -  r\v(n) 

•  set  Aw,/*1  =  0,  so  the  next  backpropagation  step  with  momentum  will  not 
take  this  into  account. 

Note  that  the  increase  and  decrease  to  the  step  size  is  linear.  An  exponential  increase 
and/or  decrease  may  be  substituted  by  implementing  the  following  two  formulae: 
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x]v(n+\)  =  i>  x  r\,j(n)  and  r\0(n+\)  =  r\.x  r\y(n) 
Although  SuperSAB  has  more  parameters  than  backpropagation  with 
momentum,  it  is  relatively  insensitive  to  most  of  the  values  of  those  parameters.  In  a 
wide  series  of  pattern  classification  and  generalization  tasks,  Tollenaere  found  that  a 
good  value  for  the  increase  factor  was  r\+  =  1 .05  and  for  the  decrease  factor,  rj.  =  0.2. 
Momentum  still  influences  the  performance  of  SuperSAB,  but  it  is  less  significant  that 
its  affect  on  standard  backpropagation.  One  drawback  to  SuperSAB  is  its  slight  lack  of 
numerical  stability  that  seems  to  arise  when  working  with  more  difficult  tasks.  This 
instability  is  easily  detectable  according  to  Tollenaere;  the  total  error  measure  literally 
"explodes"  in  magnitude.  Retrying  the  task  with  slightly  lower  momentum  factor 
typically  allows  SuperSAB  to  converge.  Tollenaere,  in  general,  urges  high  values  for 
momentum  with  typical  values  of  a  =  0.9.  It  is  interesting  to  note  that  even  with 
restarting  a  task,  SuperSAB  is  so  much  faster  than  backpropagation  with  momentum 
that  SuperSAB  still  out  performs  it.  Tollenaere  reports  speedups  of  several  factors  of 
magnitude,  which  is  very  impressive.  The  greatest  increase  in  performance  comes  from 
larger  size  tasks,  which  standard  backpropagation  has  difficulty  with.  With  larger 
tasks,  SuperSAB  has  plenty  of  time  to  adjust  the  step  size. 

One  final  problem  with  SuperSAB  is  that  like  the  delta-bar-delta  method,  SAB, 
and  Quickprop  processes,  the  weights  are  allowed  to  grow  infinitely  large  for  some 
energy  surfaces.  There  are  several  ways  to  solve  this  problem.  Using  a  weight  decay 
term,  t,  can  keep  the  weights  within  reasonable  values.  This  gives  rise  to  the  following 
weight  update  rule: 
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Aw(0  =  -tj-TTT  +  oM*  -  0  "  w('  -  1) 

Another  way  to  solve  this  problem  is  to  limit  the  step  size  to  a  value  such  as  r\,JnaX: 
10. 


Summary 

This  chapter  presented  the  backpropagation  learning  algorithm  and  discussed 
several  modifications  that  make  backpropagation  learning  faster.  None  of  the  research 
to  date  has  examined  the  performance  of  this  type  of  network  paradigm  in  engineering 
design  tasks.  The  following  chapters  describe  the  implementation  of  these  networks  in 
a  computer  program  and  also  its  performance  in  preliminary  structural  engineering 
design  tasks. 

Backpropagation  is  not  the  best  approach  to  learning  for  large,  difficult  tasks. 
Backpropagation  with  momentum  is  an  improvement,  but  the  last  two  modifications  to 
backpropagation,  known  as  Quickprop  and  SuperSAB,  are  faster. 

Both  Quickprop  and  SuperSAB  use  assumptions  about  the  curvature  of  the 
error  surface  to  improve  performance  without  calculating  the  Hessian  matrix.  The 
locality  constraint  introduces  weaknesses  in  both  processes;  however,  both  algorithms 
are  robust  enough  to  recover  in  most  cases. 

Quickprop' s  primary  assumption  is  that  the  error  surface  is  quadratic,  but  this 
assumption  is  only  roughly  true  in  vicinity  of  a  minimum.  Theoretically,  Quickprop 
could  become  very  unstable  since  this  assumption  could  cause  "random"  jumps  around 
weight  space,  but,  conservative  values  for  its  parameters  lead  to  a  very  stable 
algorithm. 
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SuperSAB  appears  to  be  a  fast  training  method.  It  makes  more  conservative 
assumptions  about  the  characteristics  of  the  error  surface.  It  also  seems  evident  that  it 
is  less  sensitive  to  its  learning  parameters  than  most  other  backpropagation  methods 
since  the  step  size  is  automatically  adjusted  based  on  the  given  task;  however,  when 
SuperSAB  fails,  it  fails  badly  with  the  weights  quickly  heading  towards  infinity. 

These  two  modifications  require  more  storage  and  slightly  more  computational 
effort  than  standard  backpropagation;  however,  significant  speed  up  make  the  cost 
well  worth  the  effort  and  resources.  Since  both  Quickprop  and  SuperSAB  use 
different  heuristics  and  assumptions,  both  are  implemented  in  a  computer  program  for 
testing  on  preliminary  structural  design  tasks.  This  computer  program  is  described  in 
the  next  chapter. 


QUIKPROP  DESIGN  AND  IMPLEMENTATION 

QuikProp  is  an  artificial  neural  network  simulation  program  that  is 
implemented  using  object-oriented  programming  techniques  in  the  C++  language.  This 
chapter  discusses  the  design  and  use  of  QuikProp.  Beginning  with  a  brief  discussion  of 
object-orient  programming,  this  chapter  discusses  design  philosophy  and  the  classes 
that  make  up  QuikProp.  To  conclude  this  chapter,  details  on  the  use  of  QuikProp  are 
discussed. 

Object-Oriented  Programming 

There  are  many  good  object-oriented  languages  such  as  C++  [Stroustrup91, 
Lippman89],  SmallTalk,  Objective-C,  CLOS,  and  others,  each  with  good  and  bad 
points.  The  choice  of  using  C++  to  implement  QuikProp  was  primarily  based  on 
convenience  and  familiarity.  C++  was  convenient  to  use  because  of  prior  experience 
with  the  language,  its  availability  on  the  chosen  computer  architecture  (Intel  x86),  and 
the  availability  of  useful  class  libraries. 

The  goals  of  object-oriented  programming  are  to  make  writing  large  complex 
programs  simpler,  maintaining  and  changing  programs  easier,  and  extending  programs 
less  complicated  through  the  use  of  abstraction,  encapsulation,  and  inheritance 
[Rumbaugh91].  Even  though  most  programming  languages  allow  programmers  to 
apply  these  concepts,  object-oriented  languages  like  C++  explicitly  support  these 
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features.  Object-oriented  programming  principles  are  discussed  in  the  following 
sections. 

Abstraction 

In  a  general  sense,  abstraction  is  the  process  of  ignoring  details  in  order  to 
concentrate  on  what  is  essential  to  the  task  at  hand.  The  introduction  and  chapter  on 
design  theory  discussed  abstraction  and  its  role  in  design.  Abstraction  in  both  design 
and  programming  have  similar  purposes.  This  section  considers  the  concept  of 
abstraction  in  terms  of  object-oriented  programming.  Abstraction  facilitates 
development  by  focusing  on  what  an  object  is  and  does  before  deciding  how  it  is 
actually  implemented.  This  selective  examination  of  certain  parts  of  a  problem  lets  the 
development  process  isolate  what  is  important.  Abstractions  are  in  essence  incomplete 
descriptions  of  a  real  world  entity  or  concept;  however,  even  a  partial  description  of  an 
object  allows  manipulation  for  some  specific  purpose.  A  good  abstraction  captures 
crucial  aspects  of  a  problem  and  limits  the  infinite  possibilities  of  the  real  world  to  that 
portion  that  is  of  concern. 

C++  provides  two  types  of  abstraction  mechanisms,  procedural  and  data 
abstraction.  Procedural  abstraction  ignores  details  about  processes;  whereas,  data 
abstraction  ignores  details  of  how  data  is  represented.  Most,  if  not  all,  programming 
languages  allow  procedural  abstraction  by  supporting  user-defined  functions. 
Functions  make  writing  large  complex  programs  easier  to  design  by  allowing  creation 
and  sequencing  of  logical  operations. 

Data  abstraction  always  involves  some  degree  of  procedural  abstraction.  For 
example  when  a  computer  program  adds  two  floating  point  numbers,  programmers 
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normally  ignore  the  details  of  how  floating  point  arithmetic  is  performed  in  binary. 
Many  languages  do  not  support  extensions  of  their  default  data  types.  Some  languages 
do;  however,  only  object-oriented  languages  combine  procedural  and  data  abstraction. 
This  is  done  using  the  concept  of  classes. 

Classes,  once  defined,  describe  everything  about  an  abstract  entity  at  once.  For 
example,  a  matrix  class  can  describe  those  aspects  of  a  matrix  that  is  common  to  linear 
algebra.  The  class  describes  the  storage  scheme  in  rows  and  columns  as  in  linear 
algebra  and  can  also  define  linear  algebraic  operations  such  as  matrix  addition.  When 
objects  of  that  class  are  used,  the  details  of  the  class's  implementation  and 
manipulation  can  be  ignored.  A  well  implemented  matrix  class  removes  the  details  of 
how  the  matrix  is  stored  and  various  operations  are  performed  with  respect  to  the 
computer  implementation.  This  creates  an  additional  layer  of  separation  between  an 
application  and  the  computer,  making  it  easier  to  write  large,  complex  applications. 
This  allows  a  programmer  to  extend  the  default  data  types  to  include  such  entities  as 
matrices  and  operate  on  those  matrices  using  standard  operators.  For  example, 
performing  matrix  addition  of  two  matrices  called  "A"  and  "B"  with  the  result  going 
into  a  new  matrix  "C",  the  line  of  code  would  look  something  like 

C  =  A  +  B 
instead  of  the  traditional  looping  operations  over  indices.  The  class  would  handle  the 
details  of  looping  over  the  indices  and  allocating  memory  or  the  new  matrix.  By 
designing  a  program  around  abstract  entities  that  have  their  own  set  of  operations,  that 
program  is  less  dependent  on  implementation  details.  If  sparse  matrix  techniques 
needed  to  be  used  to  conserve  memory,  the  matrix  class  would  hide  the  details  of 
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implementing  sparse  matrices  without  having  to  change  the  algorithm  that  utilized  the 
matrices. 

Encapsulation 

Encapsulation  hides  the  internal  workings  of  a  class  to  support  or  enforce 
abstraction.  This  consists  of  separating  the  external,  accessible  aspects  of  an  object 
from  the  internal  implementation  details  that  are  hidden.  This  prevents  a  program's 
parts  from  becoming  so  interdependent  that  small  changes  have  large,  possibly  adverse 
effects  on  other  components  of  the  application.  Encapsulation  lets  the  implementation 
of  an  object  change  without  affecting  other  objects  that  use  it.  Although  encapsulation 
is  a  programming  design  issue  using  any  language,  the  ability  of  object-oriented 
languages  to  combine  data  and  behavior  into  a  class  makes  encapsulation  even  more 
powerful  through  the  interface  to  a  class. 

A  well  designed  interface  to  a  class  is  important  for  effective  encapsulation; 
otherwise,  programmers  who  use  that  class  may  find  that  directly  accessing  a  class's 
attributes  or  data  in  order  to  perform  some  operation  is  easier  than  using  cumbersome 
interface  functions.  For  example,  a  matrix  class  might  require  reading  and  writing  the 
contents  of  matrices  to  disk.  The  implementation  of  the  matrix  class  defines  how  the 
elements  of  matrices  are  stored  in  random  access  memory.  Without  providing  an 
interface  for  reading  and  writing  matrix  elements,  each  part  of  a  program  and  each 
programmer  would  be  required  to  understand  how  the  matrix  class  manages  memory. 
When  the  matrix  class  designers  later  include  sparse  matrices,  every  part  of  the 
program  that  performed  reading  or  writing  of  matrices  might  require  changes. 
Encapsulation  forces  class  designers  to  develop  interfaces  to  their  classes  such  that 
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maintenance  of  programs  is  minimized  with  respect  to  the  amount  of  code  that  needs 
updating. 

To  enforce  encapsulation,  most  class  attributes  are  not  accessible  to  other 
classes.  One  adverse  consequence  of  encapsulation  is  an  occasional  piece  of  inefficient 
code;  however,  in  a  well  designed  class,  this  is  rare.  When  using  object-oriented 
methodology,  there  is  a  tradeoff  involved  when  it  comes  to  deciding  whether 
optimization  of  both  memory  and  code  is  more  important  than  encapsulation.  For 
prototyping  and  maintainability,  encapsulation  wins  out,  but  regardless,  a  well 
designed  class  will  have  good  interface  functions. 

Each  class  may  define  interface  functions,  public  operations  that  act  on  its 
private  data.  Public  operations  are  an  operational  equivalent  to  global  data  in  the  sense 
that  they  may  be  called  by  any  program  segment.  Private  data  is  data  that  is  only 
directly  accessible  from  within  the  class  in  both  a  reading  and  setting  sense.  It  is 
common  that  different  classes  define  operations  with  the  same  name  but  with  different 
implementations,  polymorphism.  A  good  example  of  polymorphism  is  the  addition 
operator  "+".  The  addition  operator  not  only  performs  numerical  addition  of  integers, 
real  numbers,  and  complex  numbers,  but  with  a  matrix  class,  the  addition  operator 
could  also  be  defined  to  perform  matrix  addition.  Thus  in  object-oriented  languages, 
we  can  extend  the  polymorphism  concept  to  our  own  classes,  which  can  define 
identically  named  operations  for  our  classes.  Operator  polymorphism  shifts  the  burden 
of  deciding  which  operator  implementation  to  call  away  from  the  available  procedures 
to  the  classes.  Since  classes  combine  both  data  and  procedural  abstraction,  each  class 
"knows"  what  operators  are  available  for  its  data. 
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In  procedural  programming,  a  program  typically  has  a  data  structure  hierarchy 
and  a  procedure  hierarchy.  The  design  of  these  hierarchies  may  be  independent.  In 
object-oriented  programming,  classes  combine  the  data  and  procedure  hierarchy  into  a 
single  class  hierarchy  as  shown  in  Figure  37.  Designing  classes  and  a  class  hierarchy 
are  fundamental  to  the  organizational  and  functional  tasks  of  a  program.  Later  in  this 
chapter,  a  brief 

discussion  Of  Object-  Data  Heirarchy  Program  Heirarchy 

oriented  design  is 
presented. 

In  procedural 
programs  where  data 
and  procedures  can  be 
independent  entities, 


Class  Heirarchy 

1                          1              1 

i           !          i 
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every  called  procedure  is    Figure  37:  Combining  Data  and  Program  Hierarchies  into  a 

Class  Hierarchy 
linked  at  compile  time. 

In  C++  some  functions,  called  virtual  functions,  are  bound  dynamically  at  run  time. 

Without  run  time  type  resolution,  a  programmer  is  responsible  for  which  function  to 

implement  in  a  hierarchy  of  procedures,  and  the  program's  implementation  is 

constrained  by  the  data  and  procedure  hierarchy.  Since  both  data  and  procedures  are 

encapsulated  in  classes  that  may  reside  in  a  hierarchy  and  some  of  these  procedures 

may  be  polymorphic,  dynamic  binding  encapsulates  the  implementation  details  of  the 

derived  class  hierarchy.  This  also  simplifies  extending  the  class  hierarchy  since  the 
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implementation  of  both  the  class  types  and  the  hierarchy  are  encapsulated  through  the 
use  of  inheritance. 

Inheritance 

Abstraction  becomes  even  more  powerful  by  defining  a  hierarchy  of  classes.  A 

class  in  a  hierarchy  can  be  defined  as  a  subtype  of  another  class  by  deriving  it  from  that 

class.  Similarities  between  classes  exist  for  those  that  are  derived  from  a  common 

parent  class.  When  a  common  base  class  exists  for  several  classes,  this  is  a  form  of 

abstraction  since  the  base  class  can  provide  a  high-level  way  to  view  the  derived 

classes.  A  base  class  specifies  what  the  derived  classes  have  in  common;  therefore, 

commonality  is  implemented  only  once. 

Base  classes  are  generalizations  of  a  group  of  classes.  Derived  classes  are 
specializations  of  parent  classes  and  describes  them  in  terms  of  additional  properties 
and  characteristics.  Just  as  a  base  class  aggregates  common  features,  derived  classes 
can  implement  only  those  features  that  are  unique.  Not  only  can  features  be  abstracted 
at  an  appropriate  level  but  procedures  that  manipulate  those  features. 

Inheritance  builds  on  encapsulation  by  making  code  reuse  more  practical,  and 
as  a  result  there  are  two  benefits  from  defining  a  class  hierarchy.  First,  a  derived  class 
can  share  its  parents  class  code.  Second,  a  derived  class  can  share  its  parent  class 
interface.  Most  class  hierarchies  emphasize  one  of  these  two  possible  benefits; 
however,  they  are  not  mutually  exclusive.  Hierarchies  designed  primarily  for  code 
reuse  have  different  characteristics  than  those  designed  for  a  common  interface.  A 
class  hierarchy  designed  for  code  sharing  has  most  of  the  implementation  in  the  base 
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classes  nearer  the  top  of  the  hierarchy.  Deriving  a  class  from  an  existing  class  inherits 
the  functionality  of  the  parent  class  and  as  a  result  reduces  redundant  code. 

A  class  hierarchy  designed  for  interface  sharing  has  most  of  the  implementation 
in  the  derived  classes  nearer  the  bottom  of  the  class  hierarchy.  Derived  classes  just 
inherit  the  names  of  the  base  class's  member  functions,  and  the  derived  classes  provide 
their  own  code  for  those  functions.  Thus,  polymorphism  plays  a  big  role  since  derived 
classes  perform  different  operations  with  the  same  functions.  Base  classes  define  an 
abstract  model  and  derived  classes  represent  less  abstract  implementations. 

Classes  provide  support  for  abstraction,  encapsulation,  and  inheritance.  They 
can  be  organized  into  hierarchies  that  define  their  relationships  and  reduce  redundant 
coding.  The  next  section  describes  the  basic  fundamentals  of  object-oriented  system 
design. 

Object-Oriented  Program  Design 

Procedural  decomposition  is  a  top-down  structured  programming  technique 
that  treats  a  program  as  a  description  of  processes.  Each  process  is  broken  down  into 
subprocesses  until  each  subprocess  is  a  small,  efficient  code  module  that  performs  one 
task.  Analysis  starts  with  an  abstract  view  of  the  program  and  ends  up  with  a  detailed 
view  of  the  program.  Structured  program  design  works  well  for  procedural  languages 
and  has  been  used  effectively  for  over  twenty  years. 

Object-oriented  program  design  differs  from  this  technique.  Design  does  not 
start  by  analyzing  the  problem  in  terms  of  abstract  processes  and  ending  up  with 
detailed  subprocesses.  Instead,  the  problem  is  analyzed  as  a  system  of  interacting 
objects.  Identifying  the  objects  is  the  initial  step  in  the  design. 
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Object-oriented  program  design  is  not  a  top-down  approach.  A  large,  abstract 
base  classes  is  not  first  identified  and  then  broken  down  into  smaller,  detailed 
subclasses.  Object-oriented  program  design  is  not  a  bottom-up  process  either  where 
starting  with  small  classes  and  building  them  up  into  larger,  abstract  base  classes  is  the 
objective.  Object-oriented  design  involves  working  at  both  high  and  low  levels  of 
abstraction  at  all  stages  of  the  design  process.1  Object-oriented  software  design 
requires  the  following  steps: 

•  Identification  of  classes. 

•  Assignment  of  attributes  and  behavior  to  the  classes. 

•  Finding  relationships  between  classes. 

•  Arrangement  of  the  classes  into  a  hierarchy. 

Like  any  design  process,  object-oriented  design  is  an  iterative  one.  The 

previous  four  steps  should  initially  be  done  in  the  order  of  presentation;  however,  it  is 
likely  that  as  the  design  progresses  the  designer's  paradox2  will  come  into  play. 
Assumptions  used  in  a  previous  step  will  be  altered,  and  repeating  a  step  will  be 
required.  Good  designers  go  through  each  step  while  regarding  the  consequences  and 
assumptions  of  other  steps.  It  should  be  apparent  that  general  design  theory  issues 
from  previous  chapters  are  just  as  applicable  here  in  software  design.  Experiential 


1  In  general,  no  matter  what  artifact  is  being  designed,  whether  a  machine  component 
or  a  software  program,  there  is  no  single  design  methodology  that  covers  every 
possible  design  situation.  Good  designers,  regardless  of  their  field,  are  flexible 
individuals  with  a  great  deal  of  experience.  They  can  rapidly  move  across  levels  of 
abstraction,  depending  on  the  situation. 

2  By  designing  an  artifact  that  satisfies  given  requirements,  we  can  identify  further 
requirements  or  more  details  of  the  given  requirements  that  were  unknown  or 
unforeseen  during  the  initial  stages  of  our  design. 
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knowledge  and  a  good  initial  design  speeds  the  development  process,  and  refinement 
through  revisions  is  to  be  expected  but  minimized. 

Identifying  Classes 

Since  classes  are  the  central,  active  entities  of  an  object-oriented  design,  the 
first  step  is  to  identify  classes  the  program  requires.  There  are  several  techniques  of 
varying  difficulty.  Familiarity  with  the  program's  problem  domain  will  make  the  job  of 
identifying  classes  easier,  especially  for  those  programs  that  model  physical  objects. 
Oftentimes,  each  physical  object  can  become  a  class.  Antithetically,  conceptual  entities 
can  also  be  potential  classes.  Another  method  for  identifying  classes  is  to  describe  the 
program's  purpose  and  list  all  nouns  that  appear  in  the  description.  Each  noun  then 
becomes  a  candidate  class.  Events,  actions,  and  interactions  are  also  possible  classes. 

Each  candidate  class  is  meant  to  model  some  part  of  the  problem  the  program 
will  solve.  As  potential  classes  are  identified,  possible  hierarchies  may  become  clearer, 
and  classes  that  might  assist  in  implementing  other  classes  become  evident.  Initially,  all 
classes  and  hierarchies  should  be  latently  considered  until  they  are  implemented. 
Implementation  should  not  occur  until  the  iteration  between  the  previously  mentioned 
four  steps  (class  identification,  attribute  and  behavior  assignments,  inter-class 
relationships,  and  assembly  into  hierarchies)  have  converged  to  a  realizable,  conceptual 
design.  Convergence  simply  depends  on  when  the  designers  are  satisfied  with  their 
efforts  and  believe  that  no  further  progress  can  be  made  without  an  implementation. 
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Assigning  Attributes  and  Behavior 

Once  candidate  classes  have  been  identified,  the  next  step  is  to  decide  what 
each  class  should  know  and  what  each  can  do.  This  must  be  done  within  the  context  of 
the  program  since  the  program  as  a  whole  has  information,  which  must  be  assigned  to 
some  class  or  classes,  that  makes  up  each  class's  behavior  and  state.  If  there  is 
information  or  operations  that  no  class  is  responsible  for,  then  a  new  class  may  be 
needed.  If  one  class  is  responsible  for  most  of  the  program  code,  then  it  is  advisable  to 
divide  these  responsibilities  among  other  classes.  The  work  should  be  evenly 
distributed  among  classes  for  abstraction,  encapsulation,  and  inheritance  to  be 
effective. 

A  class  has  two  categories  of  responsibilities: 

1.  A  class  must  maintain  its  attributes  (i.  e.,  the  class's  data). 

2.  A  class  must  execute  its  member  functions  (i.  e.,  the  operations  that  an  object 
can  perform;  its  behavior). 

Assigning  attributes  and  behavior  to  a  class  gives  a  clear  idea  of  a  class's  usefulness.  If 
a  class's  responsibilities  are  hard  to  identify,  then  it  may  not  represent  a  well  defined 
object  in  the  program.  Those  classes  identified  during  the  first  step  that  are  not  useful 
can  be  discarded.  Likewise,  if  a  set  of  attributes  and  behaviors  are  repeated  in  several 
classes,  they  may  describe  a  useful  abstraction.  Classes  without  any  attributes  and  a 
single  member  function  are  simply  encapsulated  processes  and  can  be  eliminated  and 
replaced  with  a  function. 

Every  class  has  attributes  that  are  the  properties  that  describe  it.  Every  instance 
of  a  class  has  a  state,  the  current  values  of  all  an  object's  attributes.  Every  class  also 
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has  behavior  or  procedures,  which  is  how  instances  of  the  class  interact  with  other 
objects  and  how  an  object's  state  changes. 

Identifying  Relationships  Between  Classes 

Although  some  classes  may  exist  in  isolation,  most  classes  build  upon  and 

cooperate  with  other  classes.  Once  some  class  attributes  and  behaviors  have  been 

identified,  relationships  between  those  classes  can  be  explored.  There  are  two  other 

relationships  between  classes  that  can  be  identified  besides  inheritance.  The  first  is 

existence,  where  one  class  depends  on  the  existence  of  another  class.  The  second  is 

composition.  In  a  composite  relationship  between  classes,  one  class  contains  (i.  e.,  has 

as  an  attribute)  another  class. 

Many  relationships  between  classes  occur  because  one  class's  interface  relies 
on  another  class's  existence.  Here,  the  first  class  may  call  member  functions  of  the 
second  class.  For  example,  a  class  that  describes  the  weights  between  neural  network 
layers  might  rely  on  another  class  to  perform  input/output  operations.  Without  the 
input/output  class,  the  neural  network  weights  class  would  be  responsible  for 
implementing  reading  and  writing  of  the  class  attributes.  Since  a  file  class  could 
encapsulate  input  and  output  to  a  disk  file,  the  weight  class  can  use  this  encapsulation 
without  reimplementing  the  code  that  actually  performs  reading  and  writing  to  a  disk 
file. 

Classes  that  contain  other  classes  have  a  containing  relationship  in  which  one 
class  is  composed  of  one  or  more  other  classes  and  built-in  types.  Composition  should 
not  be  confused  with  inheritance,  which  is  discussed  in  the  next  section.  In  a 
composition  relationship,  more  than  just  an  interface  is  shared  between  classes  since  a 
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class's  implementation  depends  on  the  other  class.  For  example,  a  matrix  class  could  be 
implemented  with  a  vector  class  in  terms  of  computer  storage  where  a  matrix  might  be 
considered  as  a  vector  of  vectors.  This  also  provides  a  layer  of  encapsulation  since  the 
interface  to  the  matrix  class  should  be  independent  of  how  the  matrix  class  is 
implemented.  Later,  if  the  implementation  of  the  matrix  class  is  changed  by  perhaps 
eliminating  the  use  of  the  vector  class,  the  interface  to  the  matrix  class  does  not 
change. 

When  identifying  relationships  between  classes,  consideration  must  be  given  to 
how  a  class  performs  its  assigned  behavior.  The  following  questions  are  useful  for 
exploring  class  relationships: 

1 .  Does  this  class  need  to  know  information  that  is  maintained  by  other  classes? 

2.  Does  this  class  utilize  the  behavior  of  other  classes? 

There  is  no  need  to  give  a  class  too  much  information  about  its  context  since  some  of 

this  information  is  maintained  by  its  relationship  to  other  classes.  A  class's  interface 
should  provide  all  the  access  to  a  class's  attributes  that  is  required  by  its  relationships 
to  other  classes;  otherwise,  encapsulation  breaks  down  and  modifying  a  class  is  not 
contained.  After  identifying  relationships  between  classes,  each  class's  interface 
becomes  more  defined  with  a  class's  behavior  being  divided  between  member  functions 
and  attributes. 

Creating  a  Class  Hierarchy 

Creating  class  hierarchies  is  an  extension  of  identifying  classes;  however,  it 

requires  knowledge  gained  from  assigning  attributes  and  behavior  to  each  class  and 

recognizing  relationships  between  classes.  Each  class's  attributes  and  behavior  assists 
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in  recognizing  similarities  and  differences  between  classes,  and  relationships  between 
classes  helps  to  determine  those  classes  that  need  to  incorporate  the  functionality  of 
other  classes.  A  properly  constructed  class  hierarchy  lets  similar  classes  share 
functionality  and  attributes  by  utilizing  polymorphism,  inheritance,  and  abstraction. 

Polymorphism  allows  calling  of  member  functions  without  specifying  the  exact 
type  of  the  class.  In  C++  this  is  accomplished  by  using  virtual  functions.  Classes  in  a 
hierarchy  can  share  similar  functionality  by  having  the  same  procedure  names  but  have 
different  implementations  of  those  procedures.  Likewise  classes  that  have  the  same 
attributes  can  be  placed  in  a  hierarchy,  which  allows  those  classes  to  share  attributes. 
If  similar  classes  only  differ  by  a  few  categories  but  have  identical  member  functions, 
then  they  should  not  be  implemented  as  separate  classes.  The  different  categories  can 
be  identified  through  different  instantiations  of  attributes. 

There  are  two  methodologies  that  allow  reuse  of  code  by  another  class.  These 
are  composition  and  inheritance.  Composition  and  inheritance  imply  different 
relationships  between  classes.  Composition  allows  a  class's  attributes  to  be  made  up  of 
instantiations  of  other  classes.  In  this  case  one  class  has  another  class.  Inheritance,  on 
the  other  hand,  is  when  one  class  is  a  specialization  of  another  class.  It  is  a  kind  of 
another  class.  Inheritance  is  usually  used  when  polymorphism  is  employed.  This  allows 
references  to  an  object  through  that  object's  parent  classes.  With  composition,  there  is 
not  implicit  relationship  between  a  composite  class  and  classes  that  make  up  its 
attributes.  When  one  class  needs  to  use  another  class's  functionality  more  than  once, 
composition  is  usually  more  appropriate  than  inheritance. 
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When  creating  a  class  hierarchy,  it  is  desirable  to  place  common  features  as 
high  in  the  hierarchy  as  possible  in  order  to  maximize  their  reuse.  Adding  abstract 
classes  high  in  the  hierarchy  increase  the  ability  to  reuse  a  class.  Abstract  classes  are 
those  that  are  not  intended  to  have  instances  but  define  common  attributes  and 
behavior  for  other  classes  lower  in  the  hierarchy. 

Classes  must  be  designed  for  two  types  of  clients,  those  that  use  its 
functionality  and  those  that  inherit  from  it.  During  class  design,  decisions  about  how 
these  two  types  of  clients  will  interface  with  the  class  must  be  made.  Care  must  be 
taken  not  to  violate  the  principle  of  encapsulation  by  allowing  derived  classes  access  to 
a  base  class's  implementation.  These  concepts  are  illustrated  in  the  following  section 
that  discusses  the  design  and  implementation  of  QuikProp. 

Design  and  Implementation  of  OuikProp 
QuikProp  is  designed  as  a  general  neural  network  simulator  that  is  able  to 
easily  incorporate  and  accommodate  different  network  topologies  and  dynamics.  From 
experience  gained  with  McClelland  and  Rumelhart's  neural  network  simulators 
[McClelland88],  it  was  seen  that  a  general  framework  for  network  development  would 
be  beneficial.  Although  McClelland  and  Rumelhart  did  incorporate  several  networks 
within  a  general  framework  for  simulating  neural  networks,  the  code  was  not 
necessarily  efficient  nor  easily  extensible.  It  was  anticipated  that  a  general  feedforward 
network  as  presented  by  McClelland  and  Rumelhart  would  need  to  be  extensively 
modified  in  order  to  test  and  use  different  extensions  to  feedforward  networks; 
therefore,  a  new  network  simulator  was  started  as  QuikProp. 
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Goals  and  Requirements 

One  of  the  motivations  to  this  research  is  to  investigate  the  learning  capabilities 
of  artificial  neural  networks  in  order  to  demonstrate  how  connectionist  systems  might 
deal  with  the  knowledge  bottleneck  problem  that  plagues  many  automated  design 
tools.  In  addition,  since  a  primary  source  of  a  designer's  knowledge  is  from  experience, 
it  was  desirable  to  present  good  designs  to  a  network  and  have  the  network  learn  from 
these  good  designs.  A  feedforward  neural  network  with  backpropagation  of  error  was 
a  useful  starting  point.  Instead  of  writing  several  programs  that  each  simulate  different 
networks,  it  was  felt  that  encapsulating  network  characteristics  in  a  class  hierarchy 
would  enable  efficient  exploration. 

The  design  of  QuikProp  has  gone  through  several  iterations  starting  with  the 
McClelland  and  Rumelhart  program  called  "bp"  [McClelland88]  that  implements  a 
form  of  backpropagation  learning  in  a  feedforward  network.  The  program  "bp" 
quickly  allowed  experimentation  with  a  backpropagation  system.  Unfortunately,  "bp" 
was  relatively  slow,  had  limited  capabilities  for  real-time  monitoring  of  learning,  had 
cryptic  input  and  output,  and  had  a  limitation  on  the  size  of  the  network.  Of  these 
limitations,  speed  of  operation  was  the  initial  motivation  for  looking  for  an  alternative 
network  simulator. 

Research  in  the  area  of  improving  the  learning  capabilities  for  backpropagation 
networks  lead  to  several  variations  of  backpropagation  networks  [Stornetta87, 
Jacobs88,  Fahlman89,  Tollenaere90]  that  indicated  that  learning  time  could  be  reduced 
by  an  order  of  magnitude.  Implementing  one  or  more  of  these  improvements  in  a 
neural  network  simulator  seemed  promising  to  reduce  learning  time.  It  was  anticipated 
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that  this  research  would  require  a  more  robust  and  extensible  simulator;  therefore, 
developing  a  simulator  would  be  beneficial  for  the  following  reasons: 

•  Network  sizes  could  be  expected  to  grow  large. 

•  Network  paradigms  may  change. 

•  Computational  efficiency  would  be  important  as  networks  grew  in  size. 

•  Post-processing  and/or  real-time  display  of  the  network's  dynamics  would 
facilitate  analysis. 

The  result  was  the  first  iteration  of  QuikProp,  which  provided  an  initial 
framework  for  an  object-oriented  implementation  of  connectionist  system.  This 
implementation  provided  not  only  a  fast  backpropagation  network  simulator  but 
several  modifications  of  backpropagation  that  improved  learning  speed.  Two  of  these 
modifications  were  the  pseudo  second-order  methods  called  Quickprop  and 
SuperSAB.  Because  of  its  robustness  and  far  superior  performance  over  traditional 
backpropagation,  the  simulator  was  named  QuikProp. 

The  current  version  of  QuikProp  improves  the  encapsulation  of  neural  network 
concepts,  outputs  key  network  dynamics  variables  to  output  files  for  post-processing, 
allows  for  interruption  and  restart,  furnishes  diagnostic  output,  and  provides  for 
flexible  input  to  the  network.  Real-time  monitoring  of  network  parameters  is  also 
available,  but  it  is  not  in  graphical  form.  The  primary  limitation  of  this  network  is  that 
feedforward  backpropagation  type  networks  have  only  been  implemented  within  this 
framework,  but  provisions  have  been  made  to  incorporate  other  network  algorithms 
and  paradigms.  The  object-oriented  design  of  the  simulator  is  discussed  next. 

All  classes  that  are  used  in  QuikProp  are  either  directly  derived  from  a  root 
class  (CObject)  provided  by  the  Microsoft  Foundation  Class  Library  (MFC) 
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[Microsoft93]  or  from  other  classes  derived  from  CObject.  Although  the  primary 
purpose  of  using  MFC  is  to  facilitate  writing  graphical  user  interfaces,  there  are 
benefits  to  extending  MFC  by  deriving  the  network  classes  from  MFC  classes.  These 
reasons  are  summarized  next. 

By  deriving  from  the  CObject  class,  object  diagnostic,  run-time  class 
information,  and  objet  persistence  services  are  made  available  to  the  derived  classes 
because  of  encapsulation  and  inheritance.  Object  diagnostic  services  include  both 
diagnostic  printing  of  internal  object  data  and  object  validity  checking  for  an  object's 
internal  consistency.  Run-time  class  information  allows  access  to  the  class  name  at 
run-time  and  safely  casting  class  pointers  of  base  classes  to  derived  classes.  Object 
persistence  allows  saving  a  complex  network  of  objects  to  permanent  storage.  In 
addition,  MFC  provides  a  number  of  memory  diagnostic  features  for  dynamic  memory 
allocation  that  makes  program  debugging  easier.  All  these  services  have  been 
described  [Microsoft93]. 

Design 

Computational  efficiency,  extensibility,  and  maintainability  are  the  primary 
design  requirements  for  QuikProp.  Backpropagation  type  networks  are  inherently 
computationally  expensive;  therefore,  wherever  possible,  efficient  numerical  coding 
techniques  were  used.  Since  there  is  no  single  network  paradigm  that  will  effectively 
solve  all  tasks  in  our  problem  domain,  QuikProp  must  be  easily  modifiable.  It  was 
anticipated  that  we  would  need  to  change  and  add  to  the  program  as  we  explore  the 
capabilities  of  artificial  neural  networks.  We  should  be  able  to  easily  change  how  a 
network  runs,  its  topology,  learning  dynamics,  and  get  debug  and  run-time 
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information.  These  design  goals  and  requirements  helped  guide  identification  of 
classes,  assignment  of  class  attributes  and  behavior,  find  relationships  between  classes, 
and  arrangement  of  classes. 

Using  past  experiences  with  "bp"  and  other  artificial  neural  network 
simulators,  it  was  easy  to  identify  that  QuikProp  would  requires  basic  matrix  and 
vector  classes.  Matrix  and  vector  classes  facilitate  the  implementation  of  any  known 
neural  network  paradigm  since  most  algorithms  use  concepts  from  linear  algebra  to 
express  their  dynamic  activities.  Therefore,  at  the  lowest  levels,  each  network 
paradigm  would  rely  on  these  classes.  Creating  these  classes  would  also  encapsulate 
the  basic  numerical  processes  of  linear  algebra  into  efficient  member  functions  of  these 
classes.  The  implementation  of  the  neural  network  algorithms  would  be  done  in  higher 
abstract  classes. 

The  next  class  that  was  identified  was  a  high  level  abstract  base  class  from 
which  each  neural  network  paradigm  would  be  derived.  All  artificial  neural  networks 
are  made  up  of  computational  neurons  arranged  in  a  network.  The  neurons  respond  to 
input  by  calculating  some  activation  value,  and  using  the  activation  value,  they 
produce  an  output  value.  A  neural  network  is  defined  by  a  topology,  representational 
scheme,  and  its  dynamics.  Based  on  this  description,  a  general  network  class  was 
identified  that  could  be  composed  of  instantiations  of  a  neuron  class,  layer  class,  input 
class,  activation  class,  output  class,  and  learning  class. 

This  scheme  initially  seemed  appropriate,  but  it  did  not  interface  well  with  the 
lower  level  matrix  and  vector  classes.  Since  efficiency  was  so  important,  each 
computational  neuron  could  be  identified  and  represented  by  an  activation  level.  All 
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neuron  activations  in  a  layer  would  most  efficiently  be  represented  by  a  vector; 
therefore,  we  would  not  gain  anything  from  abstracting  what  is  required  of  a  neuron 
into  its  own  class.  Noting  that  the  input  to  a  layer  is  the  output  from  the  previous  layer 
these  was  no  need  to  create  separate  classes,  and  output  from  a  neuron  is  a  function  of 
its  activation.  Thus,  all  neuron  activities  would  most  efficiently  be  represented  through 
instantiations  of  the  vector  class  with  the  activation/output  values  the  elements  of  the 
vectors. 

The  topology  of  a  network  can  be  defined  from  the  weights  that  connect 
neurons.  These  weights  are  effectively  represented  by  a  matrix  with  dimensions  that 
correspond  to  the  number  of  neurons  in  connected  layers.  The  weight  values 
themselves  are  the  elements  of  the  matrices. 

Every  network  will  need  to  do  some  of  the  following  tasks: 

•  Run  —  Given  external  stimuli,  produce  outputs. 

•  Train  ~  Given  training  data,  alter  the  weights  to  produce  the  correct  output. 

•  Test  ~  Verify  that  specific  patterns  are  properly  represented  within  the 
network. 

•  Perform  general  input/output  operations  such  as  get  input  stimuli,  get  training 
sets,  save  weights,  and  save  outputs. 

•  Interact  with  users  to  get  general  network  parameters  such  as  topology 
definitions. 

Feedforward  networks  such  as  backpropagation  networks  have  different  algorithms 
and  dynamics  than  harmony  theory  networks,  so  a  general  abstract  network  class 
should  represent  commonalties  that  all  artificial  neural  networks  share.  Different 
network  paradigms  and  algorithms  should  be  represented  in  separate  classes. 


180 

It  was  not  anticipated  that  harmony  theory  networks  would  be  implemented 
into  a  general  network  simulation  program  due  to  identified  limitations;  however,  since 
the  science  and  theory  of  artificial  neural  networks  is  in  the  development  stages, 
flexibility  is  a  desirable  characteristic  of  this  simulation  program.  At  the  initial  stage, 
QuikProp  should  implement  standard  backpropagation  [Rumelhart86c], 
backpropagation  with  momentum  [McClelland88],  Quickprop  [Fahlman89], 
SuperS  AB  [Tollenaere90],  and  combinations  of  modified  activation  functions, 
activation  function  derivatives,  and  error  functions.  Should  a  separate  class  be  created 
for  each  variation  of  backpropagation  and  for  activation  function  and  error  functions? 

Observing  the  differences  between  each  of  these  candidate  classes,  it  is  evident 
that  a  backpropagation  network  class  is  required  as  a  derived  class  from  the  abstract 
network  class,  but  separate  classes  for  each  backpropagation  variation  is  not 
necessary,  nor  are  activation  and  error  function  classes  essential.  Instead,  optional 
vectors  and  matrices  holding  required  weight  and  activation  values  could  be  created 
for  the  backpropagation  class  for  each  variation,  and  different  member  functions  could 
accomplish  the  variations  for  the  different  algorithms  and  desired  activation  and  error 
functions.  Thus,  different  network  types  could  be  developed,  but  slight  mutations  in 
the  networks  would  not  necessitate  individual  classes.  This  decision  is  questionable 
since  it  might  impose  unforeseen  limitations  on  the  simulator. 

The  abstract  network  class  could  be  responsible  for  the  majority  of  input  and 
output  of  weights,  inputs,  and  output  activations.  It  could  also  be  responsible  for 
getting  network  topology  parameters  and  general  network  debugging  and  run-time 
information.  Specifics  would  of  course  be  implemented  in  the  derived  network  class. 
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One  last  class  was  identified  to  represent  training  pairs  for  networks  that 
employ  supervised  learning  dynamics.  This  class's  primary  responsibility  is  to  explicitly 
keep  the  input  and  expected  output  vectors  associated  and  organized. 
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Figure  38:  QuikProp's  Class  Hierarchy 

The  resulting  class  hierarchy  is  quite  simple  and 
flexible  and  is  shown  in  Figure  38.  It  is  derived  from  a 
root  class,  CObject,  which  supplies  basic  debugging 
capabilities.  CNet  is  the  abstract  network  class.  CBp  is 
derived  from  CNet  and  realizes  backpropagation  type 
networks.  CMatrix  is  the  matrix  class;  CVector  is  the 
vector  class,  and  CVecPair  is  the  vector  pair  class.  Figure 

39  shows  the  class  composition  relationships.  It  is 

Figure  39:  QuikProp's 
important  to  note  that  both  the  matrix  and  vector  classes  Class  ComDosition 

are  independent  of  one  another  for  numerical  efficiency  reasons.  Appendix  B  details 

the  member  functions  and  member  variables  for  all  classes  developed  for  implmenting 

QuikProp. 
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Using  QuikProp 
QuikProp  requires  both  command  line  input  and  input  data  from  files.  This 
section  describes  all  options  for  running  QuikProp.  QuikProp  is  not  an  interactive 
program.  It  does  allow  for  pausing  and  restarting  during  long  training  sessions. 
QuikProp  identifies  different  networks  using  the  major  portion  of  a  filename. 
QuikProp  uses  the  filename  extension  to  identify  different  types  of  input  and  output 
files.  There  are  five  types  of  input  files  depending  on  the  execution  mode  that  is  set  on 
the  command  line.  Each  of  the  input  files,  their  contents,  and  applicable  execution 
modes  are  described  next.  Following  the  descriptions  of  the  input  files,  the  command 
line  variables  and  execution  options  are  described.  Finally,  there  are  three  types  of 
output  files  that  QuikProp  creates  depending  on  the  network  options  and  execution 
mode,  and  these  are  described  last. 

Input  Files 

Of  all  the  input  files,  the  definition  file  is  always  required  no  matter  what 

execution  mode  is  chosen.  The  other  four  input  files  depend  on  the  chosen  execution 
mode.  QuikProp  will  use  the  given  network  name  to  find  all  required  files  by  adding 
the  appropriate  filename  extension.  Thus,  all  files  will  have  the  network  name  as  a 
root.  The  network  name  can  include  the  path  to  the  input  files  if  they  are  located  in  a 
directory  other  than  where  QuikProp  exists. 

Definition  file 

The  definition  file  has  "DEF"  as  the  filename  extension.  The  definition  file  is 
required  for  all  execution  modes  of  QuikProp  since  the  definition  file  defines  the  neural 
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network  topology  and  network  parameters.  The  input  data  in  the  definitions  file  is  free 
form  and  requires  a  textual  label  for  each  data  item.  Not  all  data  items  are  required, 
and  comments  can  be  entered  by  placing  a  colon  (:)  as  the  first  character  in  a  line  of 
input.  QuikProp  will  skip  blank  lines.  When  QuikProp  reads  a  flag  type  of  data  item,  a 
value  of  1  indicates  that  the  flag  is  on,  and  a  value  of  0  indicates  the  flag  is  off. 

The  input  data  in  the  definitions  file  can  be  divided  into  two  sections.  The  first 
section  are  those  values  that  most  networks  can  use.  The  second  part  are  those  inputs 
that  are  specific  to  backpropagation  type  networks.  Table  5  describes  the  general 
network  input  data.  Footnotes  describe  details  of  individual  data  items  that  require 
further  explanation. 

Table  5 :  General  Network  Definition  Input 


Label 

Description 

Type 

Required 

INPUTS 

Number  of  input  units. 

integer 

Y 

OUTPUTS 

Number  of  output  units. 

integer 

Y 

CALC_STATS 

Flag  indicating  required  calculation 
and  display  of  training  statistics. 

integer 

N 

MAXEPOCHS 

Maximum  number  of  epochs  to  run. 

QuikProp  can  be  restarted  after 

completing  MAX  EPOCHS. 

integer 

Y 

SAVE_EPOCHS 

Number  of  epochs  between  saving 

system  data.  Setting 

SAVE_EPOCHS  to  a  value  less 

than  MAX_EPOCHS  provides 

restart  capabilities. 

integer 

N 

SYSTEMTOLERANCE 

Average  pattern  tolerance  to  halt 

processing.  The  average  error  of  all 

training  patterns  must  be  less  than 

this  value  to  halt  processing. 

real 

Y 

PATTERNTOLERANCE 

Minimum  pattern  tolerance  to  halt 

processing.  Every  training  pattern's 

error  must  be  less  than  this  value  to 

halt  processing 

real 

Y 
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Table  5~continued 


Label 

Description 

Type 

Required 

OUTPUT_STATS 

Flag  indicating  writing  of  training 
statistics  to  the  "mat"  output  file. 

integer 

N 

SMARTLEARNING 

Flag  indicating  concentration  during 

learning  on  difficult  patterns.  This 

presents  patterns  with  large  errors 

to  the  system  with  more  frequency. 

integer 

N 

MAJORITY 

Percentage  of  patterns  correct  for  a 

majority.  Used  with 

SMART  LEARNING  to  tighten 

the  SYSTEM_TOLERANCE  this 

percentage  of  patterns  has  met  the 

current  SYSTEMTOLERANCE. 

Required  if  SMARTLEARNING 

is  selected. 

real 

N 

TOLERANCEDECAY 

Fraction  to  decrease 

SYSTEMTOLERANCE  once 

MAJORITY  achieved.  Required  if 

SMART  LEARNING  is  selected. 

real 

N 

IN1TIAL_T0LERANCE 

Initial  average  pattern  tolerance  to 

halt  processing.  Required  if 
SMART  LEARNING  is  selected. 

real 

N 

The  following  table  describes  input  in  the  definitions  file  that  is  specific  to 
backpropagation  type  networks. 


Table  6:  Backpropagation  Network  Definition  Inpi 

it 

Label 

Description 

Type 

Required 

HIDDEN  LAYERS 

Number  of  hidden  layers. 

integer 

N 

HTDDEN_LAYER_SIZES 

List  of  integers  (separated  by 

commas)  specifying  number  of 

neurons  in  each  hidden  layer, 

starting  at  layer  below  input. 

Required  if  HTDDEN_LAYERS  is 

non-zero. 

integers 

N 

MOMENTUM 

Momentum  term,  a.  (Maximum  of 
10.0  for  numerical  stability.) 

real 

N 

LEARNING_RATE 

Learning  rate  or  initial  step  size  for 
SuperSAB. 

real 

Y 
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Table  6--continued 


Label 

Description 

Type 

Required 

INITRANGE 

Weight  and  bias  initialization  range. 

Specifies  the  ±  range  of  values 

within  which  the  weight  and  bias 

arrays  will  be  initially  randomized. 

QuikProp  takes  conservative 

maximum  range  of  ±2.0. 

real 

Y 

EPOCH 

Flag  indicating  epoch  (batch) 
training. 

integer 

N 

MAX_ACTIVATION 

Defines  activation  range  for 

neurons.  The  minimum  value  is 

always  MAX  ACTIVATION-  1.0. 

real 

N 

SIGMOID_PRIME_SHIFT 

Value  to  shift  sigmoid  derivative 

function.  Small  positive  shifts  less 

than  0.75  are  allowed. 

real 

N 

NONLINEARERROR 

Flag  indicating  use  of  nonlinear 

error  function,  when  set  causes 

QuikProp  to  use  the  hyperbolic 

arctangent  function  when 

determining  the  error. 

integer 

N 

SECONDORDER 

Flag  indicating  Quickprop 
algorithm.  Cannot  be  used  in 

conjunction  with  the 
ADJUST  STEPSIZE  flag. 

integer 

N 

MAX_GROWTH 

Factor  that  limits  growth  of  weight 
terms  in  Quickprop  and  SuperSAB. 

real 

N 

PERMUTE 

Flag  indicating  permutation  of 

pattern  presentation  between 

epochs. 

integer 

N 

WEIGHT  DECAY 

Factor  to  decrease  weight  terms,  x. 

real 

N 

ADJUSTSTEPSIZE 

Flag  indicating  SuperSAB 
algorithm. 

integer 

N 

MAX  STEP 

Maximum  allowed  step  size. 

real 

N 

LINEARINCREASE 

Flag  indicating  linear  increase 
formula,  else  exponential. 

integer 

N 

LINEARDECREASE 

Flag  indicating  linear  decrease 
formula,  else  exponential. 

integer 

N 

STEPINCREASE 

Factor  to  increase  step  size  in 
SuperSAB. 

real 

Y 

STEPDECREASE 

Factor  to  decrease  step  size  in 
SuperSAB. 

real 

Y 
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This  completes  the  network  definition  file  parameters. 

Training  file 

The  training  file,  which  has  "FCT"  as  a  filename  extension,  defines  the  training 

pairs  to  be  presented  to  the  network.  The  training  file  is  required  when  QuikProp  is  in 

the  LEARN  mode.  This  file  uses  free  form  input  with  spaces  between  each  vector 

element  and  a  comma  between  vectors.  Thus,  each  training  pair  of  vectors  is  on  one 

line  of  the  training  file.  Vector  elements  are  real  numbers.  It  is  useful  to  include  a 

comment  line  that  describes  each  input  and  output  neuron.  A  comment  line  has  a  colon 

as  the  first  character. 

Scale  factors  file 

The  magnitude  of  neuron  output  values  depends  on  the  activation  function  for 

the  neurons.  QuikProp  only  uses  the  sigmoid  function  as  its  activation  function. 

QuikProp  allows  setting  a  scale  factor  that  defines  an  offset  to  the  sigmoid  function 

(see  the  MAXACTIVATION  definition  value  in  the  definition  file  description  above). 

When  training  a  network,  QuikProp  must  make  sure  that  all  training  pairs  have 

elements  that  fall  between  the  values  of  0  and  1.  If  you  have  set  a 

MAXACTIVATION  value,  QuikProp  will  take  care  of  the  offset  in  the  training 

values;  however,  QuikProp  does  not  require  that  the  maximum  and  minimum  training 

pair  elements  are  1  and  0,  respectively.  If  you  want  to  specify  a  smaller  range,  you  can 

do  so  using  the  scale  factors  file.  If  you  do  not  want  to  specify  a  scale  factor, 

QuikProp  will  scan  all  training  pair  vectors  to  determine  a  scale  factor  for  each  input 

and  output  neuron  based  on  the  maximum  and  minimum  values  found. 
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The  scale  factor  file  has  "SCL"  as  the  filename  extension.  This  file  is  only  used 
in  LEARN  mode  and  is  optional.  If  it  exists,  then  QuikProp  will  use  it;  otherwise, 
QuikProp  will  automatically  set  the  scale  factors.  Each  input  and  output  neuron  can 
have  a  separate  scaling  factor.  If  you  use  a  scale  factor  file,  you  must  supply  a  scale 
factor  for  each  input  and  output  neuron.  This  file  has  a  similar  form  to  the  training  file 
but  only  requires  two  vector  pairs.  The  first  vector  pair  in  the  file  contain  the 
minimums  for  the  input  and  output  neurons,  respectively.  Each  vector  is  separated  by 
a  comma  as  in  the  training  file.  The  next  vector  pair  contain  the  maximums  for  the 
input  and  output  neurons. 

Test  file 

The  test  file  has  "TST"  as  a  filename  extension.  This  file  is  required  when 
QuikProp  is  run  in  TEST  mode.  The  purpose  of  the  test  file  is  to  evaluate  how  well 
the  network  learned  specific  patterns  and  how  well  it  can  generalize.  When  training  a 
network,  a  collection  of  training  pairs  that  encompass  the  entire  spectrum  of  examples 
is  placed  into  the  training  file.  The  network  learns  from  these  files,  and  when  all  the 
patterns  have  been  learned,  the  final  set  of  pattern  errors  and  total  system  error  give  an 
indication  of  how  well  the  network  learned  from  those  examples.  It  is  always  a  good 
idea  to  withhold  several  examples  for  testing.  Using  these  vector  pairs  of  input  and 
expected  output  values,  the  network  generates  pattern  errors  for  each  of  these  test 
pattern  pairs.  This  error  gives  an  indication  of  how  well  the  network  learned  the 
concepts  inherently  contained  in  the  training  file  and  how  well  it  can  generalize  from 
the  learned  knowledge.  If  the  pattern  errors  are  high,  then  it  is  unlikely  that  the 
network  will  perform  well  on  other  inputs  that  it  has  not  specifically  learned  and  the 
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utility  of  the  learned  knowledge  is  minuscule.  Changing  network  topology  and/or  the 
training  pattern  pairs  and  then  retraining  the  network  is  the  only  next  step.  The  next 
chapter  gives  some  guidelines  and  suggestions  for  developing  a  training  and  testing  set 
of  data  and  determining  the  network  topology. 

The  test  file  is  identical  in  form  as  the  training  file.  It  is  composed  of  a  set  of 
vector  pairs,  one  pair  per  line  of  input;  each  vector  in  a  pair  is  separated  by  a  comma. 
Spaces  separate  elements  in  a  vector.  The  first  vector  in  each  pair  is  the  input  vector, 
and  the  ensuing  vector  is  the  expected  output  vector.  For  each  vector  pair  presented  to 
the  network  for  testing,  QuikProp  produces  a  pattern  error  value. 

Input  file 

The  input  file  has  a  filename  extension  of  "IN".  It  is  used  for  presenting  new 
input  patterns  to  the  network.  This  file  is  required  when  QuikProp  is  run  in  RUN 
mode.  It  is  made  up  of  a  series  of  input  vectors,  one  per  line.  The  network  will 
produce  an  output  file  containing  the  result  of  presenting  each  input  vector  to  the 
network. 

Execution  Modes 

QuikProp  has  three  execution  modes:  LEARN,  TEST,  and  RUN.  Each  of 
these  execution  modes  is  set  at  run-time  on  the  command  line.  Only  one  execution 
mode  command  line  option  can  be  given.  QuikProp's  command  line  has  three  possible 
entries.  The  first  entry  is  the  program  name  itself:  QUIKPROP.  The  next  term  is 
required  and  defines  the  network  name.  All  input  and  output  files  use  this  name  as  the 
base  name  for  all  files;  therefore,  you  can  prefix  the  network  name  with  a  path  that 
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defines  where  all  input  and  output  files  exist.  The  next  item  fixes  the  run  mode  and  is 
required.  The  next  three  sections  describe  each  run  mode  and  how  to  invoke  it.  Once 
QuikProp  is  started  there  is  no  way  to  change  execution  modes  until  that  mode  is 
finished. 

Learn  mode 

LEARN  mode  is  initialed  by  starting  QuikProp  with  the  "-mL"  command  line 
option.  When  QuikProp  starts  in  LEARN  mode,  it  opens  and  read  the  definitions  file 
for  the  specified  network.  After  processing  the  definitions  file,  QuikProp  opens  and 
reads  a  scale  factor  file,  if  it  exists,  before  opening  and  reading  the  training  file.  If  a 
weights  file  exists,  QuikProp  will  open  and  read  it.  The  weights  file  allows  QuikProp 
to  continue  training  from  a  previously  stopped  epoch.  After  all  data  is  read  and 
processed,  learning  commences  and  continues  until  QuikProp  reaches  the  maximum 
epoch  number  or  until  the  system  converges. 

Test  mode 

TEST  mode  is  begun  by  starting  QuikProp  with  the  "-mT"  command  line 
option.  When  QuikProp  starts  in  TEST  mode,  it  opens  and  read  the  definitions  file  for 
the  specified  network.  After  processing  the  definitions  file,  QuikProp  opens  and  reads 
a  scale  factor  file,  if  it  exists,  before  opening  and  reading  the  test  file.  Once  the  test  file 
is  read,  QuikProp  opens  and  read  the  weights  file.  For  each  pattern  pair  in  the  test  file, 
QuikProp  presents  the  input  vector  to  the  network  and  calculates  the  output. 
QuikProp  then  calculates  the  pattern  error  by  comparing  the  calculated  output  to  the 
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expected  output  before  processing  the  next  input  vector.  Once  all  input  vectors  are 
processed,  QuikProp  halts  execution. 

Run  mode 

RUN  mode  is  begun  by  starting  QuikProp  with  the  "-mR"  command  line 

option.  When  QuikProp  starts  in  RUN  mode,  it  opens  and  read  the  definitions  file  for 

the  specified  network.  After  processing  the  definitions  file,  QuikProp  opens  and  reads 

the  input  file  and  then  the  weights  file.  For  each  input  pattern  pair  in  the  input  file, 

QuikProp  calculates  an  output  vector  and  writes  it  to  the  output  file.  Once  all  input 

vectors  are  processed,  QuikProp  halts  execution. 

Output  Files 

There  are  three  output  files  that  QuikProp  may  create  depending  on  the  run 

mode  chosen.  The  following  paragraphs  describe  the  contents  of  each  file. 

Output  file 

QuikProp  creates  an  output  file  with  the  filename  extension  "OUT"  when 

executing  in  RUN  mode.  The  output  file  contains  the  output  activation  levels  of  the 

output  layer  neurons.  Each  input  pattern  will  produce  an  output  pattern  in  this  file. 

Plot  file 

In  LEARN  mode,  QuikProp  will  append  training  statistics  to  a  readable  plot 

file  that  has  the  "PLT"  filename  extension.  The  plot  file  is  only  written  if  the 

OUTPUT_STATS  flag  is  set.  Each  row  of  information  in  this  file  contain  the 

aggregate  training  statistics  for  each  epoch  of  training.  The  following  information  is 
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written  to  the  plot  file:  epoch  number,  total  system  error,  average  weight  magnitude, 
and  average  weight  change.  The  total  system  error  is  a  measure  of  learning 
completion.  The  average  weight  magnitude  is  a  measure  of  weight  space  saturation. 
Both  QuikProp  and  SuperSAB  allow  weights  to  grow  infinitely  large  in  magnitude.  If 
the  weights  get  too  large,  numerical  problems  will  develop.  On  the  other  hand,  if 
weight  get  too  small,  weight  space  becomes  flat  without  well  defined  minimums.  The 
average  weight  change  indicates  the  effectiveness  of  each  step  during  training.  If  this 
value  is  large,  large  steps  are  being  taken,  hopefully  towards  a  minimum.  If  this  value 
is  small,  there  might  not  be  much  progress  being  made  toward  the  minimum.  Used  in 
conjunction  with  the  average  weight  magnitude,  it  might  help  indicate  when  network 
paralysis  occurs  and  no  further  learning  is  taking  place. 

Weights  file 

The  weights  file  has  "WTS"  as  the  filename  extension.  It  is  a  binary  file  that 

contains  the  last  network  state.  Included  in  this  file  are  the  last  epoch  number  from 
training,  current  weight  values,  current  change  in  weight  values,  and  the  current 
biases.  If  the  Quickprop  algorithm  is  being  used,  then  the  last  epoch's  error  derivatives 
are  also  saved.  If  SuperSAB  is  the  paradigm  of  choice,  then  the  learning  rate  matrix  is 
also  saved. 

Training  can  take  a  long  time.  The  weights  file  not  only  serves  to  keep  the 
adjusted  weights  for  later  TEST  and  RUN  mode  processing,  also  provides  QuikProp 
with  restart  capabilities.  QuikProp  is  designed  to  periodically  save  the  network  state 
such  that  training  effort  will  not  be  lost  to  an  ill-timed  power  outage  or  other 
catastrophe. 
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Summary 
This  chapter  provided  a  brief  description  of  object-oriented  programming  and 
program  design.  It  then  related  these  concepts  to  the  design  of  QuikProp,  a  neural 
network  simulator.  QuikProp  implements  general  feedforward  neural  networks  and 
specifically  supports  most  variations  of  backpropagation  and  two  forms  of  pseudo 
second-order  feedforward  networks  called  Quickprop  and  SuperSAB  that  were 
detailed  in  the  previous  chapter.  Each  of  QuikProp's  run  modes  are  detailed. 
QuikProp's  run  modes  are  designed  for  training,  testing,  and  recalling  patterns  from 
the  network.  Finally,  all  files  that  QuikProp  requires  and  creates  are  enumerated. 


PRELIMINARY  DESIGN  STUDIES 

This  chapter  presents  the  results  and  insights  gained  from  applying  feedforward 
artificial  neural  networks  to  preliminary  structural  design  problems.  A  number  of  case 
studies  are  presented  along  with  the  performance  of  the  networks  in  training  and 
running  for  these  studies.  A  section  on  how  to  use  artificial  neural  networks,  neural 
engineering,  prefaces  the  tests.  This  section  is  based  on  experiences  in  neural  network 
modeling  and  provides  useful  insights  into  the  characteristics  of  problems  that  are  well 
suited  to  neural  networks,  how  to  setup  problems  including  training  and  test  cases,  and 
network  size  and  topology  issues. 

Neural  Engineering 
Neural  engineering  can  be  described  as  a  set  of  loose  rules  and  theory 
pertaining  to  the  selection  of  a  neural  paradigm,  the  size  and  makeup  of  network,  and 
problem  setup  for  good  performance.  Performance  is  measured  in  terms  of  problem 
generalization,  ability  to  learn,  and  recall.  A  problem's  ability  to  learn  is  important  due 
to  time  and  computing  constraints.  In  theory  any  problem,  regardless  of  size  and 
complexity,  can  be  learned  by  some  artificial  neural  network  since  one  of  the  goals  of 
artificial  neural  networks  is  to  model  the  brain;  however,  the  problem  may  be  so 
complex  and  large  that  existing  computer  hardware  and  software  cannot  solve  the 
problem  in  a  reasonable  amount  of  time. 
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We  must  constantly  remind  ourselves  that  even  the  most  sophisticated 
implementations  of  artificial  neural  networks  barely  model  very  primitive  natural  neural 
systems.  Even  then,  there  is  considerable  debate  as  to  how  well  artificial  neural 
systems  accurately  and  adequately  simulate  biological  systems.  Consider  that  the 
human  brain  probably  has  on  the  order  of  1010  neurons  and  each  neuron  has  1,000  to 
100,000  interconnections  [Rumelhart86a,  page  131].  Also  consider  that  humans  use 
several  processes  for  learning  and  call  upon  all  five  senses  (vision,  hearing,  touch, 
taste,  and  smell)  to  interact  with  the  environment.  Finally,  consider  that  humans  spend 
their  entire  lives  learning.  Compared  to  artificial  neural  networks,  humans  and  even  the 
lowest  neural  systems  in  nature  are  more  sophisticated  than  current  artificial 
implementations.  Learning  times  measured  in  days  that  are  typical  for  artificial  neural 
networks  cannot  be  compared  to  the  available  time,  flexibility,  and  resources  available 
to  natural  systems. 

Research  in  artificial  neural  networks  is  still  at  the  preliminary  stages.  At  the 
present  time,  neural  dynamics  only  crudely  approximate  biological  dynamics.  Still, 
specialized  artificial  neural  networks  show  interesting  properties  that  can  be  harnessed 
to  solve  difficult  problems  providing  that  the  problems  are  properly  represented  and 
the  networks  properly  configured.  This  section  addresses  these  issues  and  the 
following  sections  in  this  chapter  give  preliminary  structural  engineering  design 
examples. 

Problem  and  Paradigm  Selection 

Too  often  the  wrong  method  is  applied  to  the  right  problem.  It  is  important  to 
understand  the  types  of  problems  that  are  well  suited  to  artificial  neural  networks.  In 
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general,  artificial  neural  networks  map  one  vector  space  to  another.  Where  this 
mapping  relationship  is  unknown  or  too  complex  to  comprehend,  then  artificial  neural 
networks  may  provide  the  proper  computational  tool  to  investigate  and  perhaps  solve 
the  problem.  Problems  that  possess  well  known  closed  form  solutions  or  problems  that 
are  solvable  using  existing  mathematical  techniques  are  not  good  candidates  for 
employment  of  neural  networks.  Artificial  neural  networks  tend  to  require  more 
computation  to  produce  less  accurate  results  than  closed  form  solutions,  so  the  payoff 
from  using  a  neural  network  will  be  depreciated. 

There  basic  distinctions  between  different  neural  network  paradigms.  First, 
networks  may  differ  in  the  form  of  their  outputs,  either  continuous  or  binary  outputs. 
Second,  networks  differ  in  the  type  of  training,  either  supervised  or  unsupervised. 
Supervised  training  requires  a  set  of  input-output  vector  pairs.  The  vector  pairs  are 
presented  to  the  network  and  the  weights  are  adjusted  such  that  the  error  between  the 
calculated  output  and  the  expected  output  is  minimized.  Unsupervised  training  only 
requires  a  set  of  input  vectors.  The  weights  of  the  network  are  adjusted  based  on  a 
distance  measure  that  groups  similar  input  vectors  to  produce  the  similar  output. 
Third,  networks  are  either  recurrent  or  nonrecurrent.  Recurrent  networks  have 
complicated  and  time  consuming  training  processes.  Recurrent  networks  have 
feedback  connections  and  appear  well  suited  to  systems  that  have  dynamic  behavior 
since  the  feedback  provided  by  recurrent  neurons  may  expand  the  range  of  neuron 
responses.  Nonrecurrent  networks  with  a  hidden  layer  of  nonlinear  neurons  are 
completely  general  for  static  mappings.  With  enough  hidden  layer  neurons,  these  types 
of  networks  can  approximate  any  continuous  mapping  of  one  vector  space  to  another 
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[Hornik89].  There  are  numerous  neural  network  paradigms  available,  but  these  three 
distinctions  seem  to  encompass  the  major  characteristics. 

The  choice  of  paradigm  for  a  specific  problem  must  address  the  following 
issues: 

•  Is  the  problem  better  suited  to  supervised  or  unsupervised  learning? 

•  Will  learning  be  performed  once  and  then  the  network  recall  information,  or 
will  the  problem  require  constant  updating  over  the  life  of  the  application  using 
incremental  learning? 

•  Can  binary  input  and  output  values  be  used  or  are  continuous  input  and  output 
values  required? 

•  Will  network  capacity  be  limited  in  terms  of  number  of  neurons  and  layers? 

•  How  efficient  are  learning  dynamics? 

•  How  efficient  are  recall  dynamics? 

•  Can  a  nonrecurrent  network  be  used? 

•  Is  the  problem  one  of  mapping  vector  spaces  or  of  storing  and  recalling  the 
stored  pattern? 

Considering  these  questions  can  help  lead  to  a  paradigm  well  suited  to  a  problem.  In 
this  application,  nonrecurrent,  supervised  learning,  continuous  valued  input  and  output 
neurons,  and  efficient,  one-time  learning  were  important  features  since  these  seemed  to 
emulate  particular  characteristics  of  preliminary  design.  One  compromise  that  differs 
from  human  oriented  preliminary  design  is  non-incremental  learning.  Static  learning 
systems  were  chosen  because  of  tractability  and  efficiency. 

Generalization 

Generalization  capabilities  allow  an  artificial  neural  network  to  approximate  a 
correct  output  in  response  to  an  input  vector  that  was  not  in  the  training  set.  However, 
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generalization  is  affected  by  the  complexity  of  a  network  and  the  characteristics  of  the 
training  set. 

Just  as  polynomial  interpolation  approximates  values  between  those  in  a  given 
data  set,  artificial  neural  networks  estimate  a  correct  output  vector  from  a  given  input 
vector  not  in  the  training  set.  Polynomial  interpolation  generally  works  well  when 
estimating  values  that  are  between  data  points,  and  usually  perform  poorly  when 
extrapolating  outside  bounds  of  the  data  points.  Neural  networks  exhibit  similar 
unreliability  when  given  a  problem  that  is  outside  the  scope  of  the  training  data. 

The  analogy  of  neural  networks  to  polynomial  interpolation  can  also  give  an 
indication  as  to  the  size  of  a  network  that  could  generalize  well.  To  generalize  well, 
we  essentially  would  like  to  create  an  "overdetermined"  system  where  the  number  of 
unknown  values  exceeds  the  number  of  known  values.  The  number  of  unknown  values 
is  equal  to  the  number  of  weights  that  must  be  determined,  and  the  number  of  known 
values  is  equal  to  the  number  of  output  values  we  are  trying  to  generate.  For  example, 
if  a  network  requires  two  input  neurons  and  two  output  neurons  and  five  training  sets 
are  available,  the  number  of  known  values  would  be  ten.  Two  hidden  neurons  in  a  fully 
connected  configuration  without  any  direct  connections  between  input  and  output 
neurons  would  have  eight  unknown  weights.  When  the  number  of  known  values  is  less 
than  the  number  of  unknown  values,  the  problem  is  "underdetermined"  and  would 
either  have  no  solution  or  more  than  one  solution.  If  three  hidden  neurons  were  used 
instead,  the  number  of  unknown  weights  would  be  twelve,  and  the  problem  would  be 
overdetermined.  Overdetermined  problems  generally  do  not  have  an  exact  solution, 
but  it  is  possible  to  generate  an  approximate  solution  such  that  the  answers  come  close 


198 

to  the  desired  output.  The  next  three  sections  on  network  size,  hidden  neurons,  and 
training  sets  are  all  related  back  to  the  goal  of  achieving  good  generalization. 

Network  Size 

When  preparing  a  problem  to  be  solved  by  a  neural  network,  it  is  important  to 
identify  important  features  of  the  problem  domain.  It  is  extremely  inefficient  to  gather 
all  possible  features  and  separate  them  into  input  and  output  for  two  reasons.  First, 
this  will  create  a  larger  than  necessary  network,  which  will  increase  learning  time. 
Second,  working  with  data  that  does  not  effect  the  network  results  can  produce 
network  paralysis,  where  the  network  is  unable  to  continue  learning.  As  the  network 
searches  through  weight  space,  spurious  weight  connections  can  produce  irrelevant 
error  signals  that  cause  the  system  to  move  farther  from  a  minimum. 

Familiarity  with  the  problem  domain  is  the  best  way  to  identify  important 
features.  It  is  best  to  start  with  a  minimum  set  of  input  and  output  features.  Each 
feature  is  then  associated  with  one  or  more  neurons.  Some  features  may  require 
multiple  neurons  for  proper  representation;  whereas,  a  single  continuous  neurons  may 
suffice  in  other  cases. 

In  general,  each  feature  chosen  should  have  some  meaning.  For  example, 
consider  the  preliminary  design  of  a  steel  frame  building.  A  network  is  being 
developed  to  identify  promising  frame  configurations.  The  frame  will  be  hidden  by  a 
facade.  If  one  candidate  feature  for  representation  is  frame  color,  but  frame  color  is 
not  important  since  it  will  be  hidden,  why  have  color  as  an  feature?  Other  features  are 
much  more  important,  so  a  color  feature  could  falsely  distinguish  different  examples 
and  prevent  good  generalization  and  recall. 
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Another  useful  technique  is  to  aggregate  features.  Reducing  the  number  of 
features  will  generate  a  more  efficient  network,  so  consideration  should  always  be 
given  to  combining  multiple  related  features  into  a  single  feature.  One  way  to  do  this  is 
to  identify  functions  of  features.  Considering  the  frame  design  example  again,  if  frame 
stiffness  and  mass  are  important  features,  it  might  be  possible  to  aggregate  these  into  a 
single  feature  that  is  a  function  of  these  two  features,  e.g.  natural  frequency.  It  is 
important  to  understand  that  aggregation  of  features  can  also  hide  important 
distinctions.  For  example,  if  cost  is  a  significant  feature,  but  it  has  been  aggregated 
into  a  weight  feature,  this  might  hide  items  that  are  important,  such  as  maintenance 
costs  or  fabrication  costs  that  do  not  have  any  direct  relationship  with  a  structure's 
dead  weight.  Familiarity  with  the  problem  domain  and  understanding  inherent 
limitations  of  artificial  neural  networks  will  provide  guidance  in  these  instances. 

The  larger  the  network,  the  more  training  pairs  are  necessary  in  order  to 
generalize  well  [Baum89],  More  complex  networks,  in  terms  of  the  number  of  neurons 
and  weights,  are  capable  of  learning  more  patterns;  therefore,  the  network  needs  a 
large  training  set  to  define  all  that  it  can  learn.  Too  many  neurons  in  a  hidden  layer 
results  in  the  network  performing  simple  recall  without  generalizing.  Networks  that 
cannot  generalize  essentially  perform  as  a  lookup  function  from  a  database  of 
examples.  On  the  other  hand,  a  network  that  is  too  small  suffers  from  the  opposite 
problem  of  not  accurately  depicting  all  the  training  data.  If  the  number  of  input  and 
output  neurons  is  set  by  identifying  requisite  features,  then  the  number  of  hidden  layers 
and  neurons  in  those  layers  is  determined  by  the  required  accuracy  and  generalization. 
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Hidden  Neurons 

A  single  hidden  layer  is  ideal.  Backpropagation  training  of  networks  tends  to 
slow  down  by  an  order  of  magnitude  every  time  a  layer  is  added  to  a  network.  There 
are  two  reasons  for  this  phenomenon.  First,  more  processors  require  more 
computational  effort.  The  second  reason  is  not  quite  so  clear  but  perhaps  more 
important.  As  the  error  is  propagated  back  through  the  network,  the  error  signal  is 
diluted  as  it  passes  through  each  layer.  Therefore,  the  modification  of  weights  in  the 
early  layers  is  slow,  which  results  in  overall  slow  learning.  One  way  of  reducing  this 
effect  is  to  have  direct  connections  between  all  layers.  Direct  connections,  or  so  called 
short-cut  connections,  consist  of  trainable  weights  as  in  typical  connections,  but  they 
provide  direct  feedback  of  errors  to  earlier  layers  of  neurons.  Direct  connections  do 
not  supplant  normal  layer  to  following  layer  connections  but  augment  network 
topology.  Unfortunately,  the  current  implementation  of  the  QuikProp  program  does 
not  allow  for  direct  connections. 

This  limitation  can  be  made  up  by  using  more  hidden  neurons  per  layer  for 
fewer  hidden  layers.  Baum  and  Haussler  identified  the  following  general  issues 
concerning  hidden  layer  size  [Baum89]: 

•  The  number  of  training  cases. 

•  The  amount  of  noise  in  the  training  data. 

•  The  desired  accuracy  of  generalization. 

•  The  training  method. 

•  The  type  of  activation  functions. 

Another  technique  for  determining  the  number  of  hidden  neurons  that  has  merit 

but  is  ordinarily  impractical  is  to  develop  a  training  set  and  a  cross  validation  set  of 
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examples  [Morgan90].  Using  these  two  example  sets,  a  measure  of  the  generalization 
error  can  be  developed.  Then  starting  with  a  single  hidden  neuron,  incrementally 
increase  the  number  of  hidden  neurons  until  the  minimum  generalization  error  is  found. 
This  method  is  totally  unrealistic  for  applications  for  two  reasons.  First,  a  cross 
validation  set  of  examples  is  almost  impossible  to  derive  since  it  is  difficult  enough  to 
develop  complete  training  sets  for  real-world  applications.  Second,  the  time  required 
to  perform  these  tests  is  prohibitive  for  practical  problems. 

There  are  two  heuristics  along  with  the  interpolation  analogy  that  provide  a 
starting  point  for  the  number  of  hidden  nodes.  They  are  experientially  based  without 
any  mathematical  theory  involved;  therefore,  they  are  not  reliable.  The  first  heuristic  is 
from  Lang  and  Witbrock  [Lang89]  and  is  simply  that  each  connection  weight  can 
easily  learn  1.5  bits  of  information.  This  heuristic  does  not  take  into  account  the 
number  of  hidden  layers  nor  explicitly  the  number  of  neurons  in  the  input  or  output 
layers.  It  is  a  rough  heuristic  that  typically  gives  a  sufficient  estimate,  but  depends  on  a 
fairly  complete  training  set. 

The  second  heuristic  is  based  on  a  public  communication  from  Scott  Fahlman. 
He  suggests  using  the  following  formulae  for  one  and  two  hidden  layers: 
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where,  p  and  q  are  the  number  of  neurons  in  the  first  and  second  hidden  layers, 
respectively,  and  n  and  m  are  the  number  of  neurons  in  the  input  and  output  layers, 
respectively. 

The  first  heuristic  requires  that  the  training  set  embody  a  close  estimate  of  the 
amount  of  information  to  be  learned.  With  a  complete  training  set,  a  good  estimate  of 
the  number  of  bits  of  information  contained  is  available.  Rarely  is  this  true  for 
preliminary  design  problems,  but  when  appropriate,  this  is  an  acceptable  initial 
measure.  In  addition,  this  heuristic  provides  a  lower  bound  on  the  effectiveness  of  each 
weight.  If  a  weight  can  easily  learn  1.5  bits  of  information,  then  each  weight  has  the 
potential  for  learning  more,  albeit  with  more  difficulty.  The  second  heuristic  does  not 
account  for  the  number  of  training  cases.  It  is  more  parametric  and  easier  to  apply. 
These  two  heuristics  provide  a  good  starting  point  for  determining  the  number  of 
hidden  nodes,  but  they  are  heuristics. 

Training 

A  network  produces  a  correct  output  only  if  it  has  been  trained  on  an  input 

that  is  sufficiently  similar.  A  training  set  must  provide  a  full  and  accurate 

representation  of  a  problem  domain.  For  example,  we  wish  to  train  a  network  to 

estimate  the  required  strength  of  a  wing  section  given  a  description  of  the  applied 

loading.  If  we  only  trained  the  network  using  data  from  low  performance  aircraft,  then 

we  could  expect  to  get  inaccurate  responses  when  we  input  applied  loading 

descriptions  from  high  performance  aircraft.  This  phenomenon  is  similar  to  numerical 

extrapolation  problems.  The  behavior  of  a  neural  network  is  based  on  a  model  of 

physical  reality  as  represented  in  a  given  training  set. 
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Ideally,  a  training  set  should  be  a  statistically  representative  sample  of  a 
problem  domain.  For  design  problems  this  is  unrealistic.  What  we  hope  to  achieve  in 
some  way  is  to  provide  learnable  training  data.  In  preliminary  design,  our  goal  is  to 
identify  promising  alternatives.  Promising  alternative  designs  are  feasible  designs  that 
satisfice  design  requirements.  Because  we  cannot  "accurately"  depict  an  entire  feasible 
design  space  in  a  training  set,  we  must  be  aware  that  some  resulting  preliminary 
designs  will  be  not  be  feasible.  In  this  sense  we  are  extrapolating  beyond  the  physical 
reality  represented  in  our  training  data. 

Baum  and  Haussler  address  the  question  of  when  a  network  can  be  expected  to 

generalize  well.  Based  on  their  experiments,  they  found  that  for  continuously  valued 

inputs  and  outputs  in  feedforward  networks,  upper  and  lower  bounds  for  the  number 

of  necessary  training  pairs  are: 

(W\  (W       N\ 

47J<MSO[7log7J  (3) 

where,  m  is  the  number  of  required  training  pairs;  W\s  the  number  of  weights  in  the 
network;  N  is  the  number  of  neurons  in  the  network;  and  e  is  the  desired  generalization 
error,  defined  as: 

0<e</s  (4) 

This  heuristic  is  questionable  since  a  number  of  researchers  have  found  this  to 
be  overly  conservative  [le  Cun90,  Morgan90].  Morgan  and  Bourlard  have  shown 
through  experiments  that  not  only  is  this  number  overly  conservative,  but  more 
significant  is  that  the  generalization  error  is  does  not  increase  in  proportion  to  changes 
in  the  number  of  hidden  nodes.  Essentially,  Morgan  and  Bourlard  show  that  the 
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generalization  error  is  relatively  insensitive  to  the  number  of  hidden  nodes  within  a 
reasonable  range.  The  most  important  point  is  to  measure  how  well  a  network  has 
trained  by  keeping  several  test  cases  aside  in  order  to  increase  your  confidence  level  in 
a  network's  performance. 

The  quality  of  the  data  that  make  up  a  training  set  is  important  with  respect  to 
performance.  The  source  of  training  data  can  be  objective  and/or  subjective. 
Subjective  data  are  open  to  interpretation.  Objective  data  are  unequivocal  in  meaning. 
Subjective  data  may  lead  to  sets  of  contradictory  training  pairs  that  will  slow  down 
learning  and  will  produce  controversial  results.  Objective  data  should  be  used  in 
making  up  training  data  wherever  possible;  however,  the  nature  of  design  precludes 
completely  objective  data.  The  remainder  of  this  chapter  presents  several  preliminary 
design  problems  that  explore  the  utility  of  feedforward  artificial  neural  networks  in  this 
problem  domain. 

Beam  Design  Example 
The  beam  design  example  from  Appendix  A  and  initially  presented  in  the 
chapter  on  design  theory  and  methodology  is  a  tractable  design  problem  that 
incorporates  multiple  conflicting  goals  with  a  range  of  potential  and  easily  identifiable 
solutions.  Several  test  cases  are  run  that  explore  backpropagation  networks  and  their 
ability  to  solve  preliminary  design  problems.  The  design  problem  is  defined  as  given 
one  or  more  design  requirements,  determine  a  preliminary  design  that  satisfies  the 
requirements.  A  preliminary  design  is  characterized  by  the  type  of  solution  from  a  set 
of  solutions  along  with  initial  estimates  the  design's  properties. 
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In  order  to  differentiate  between  designs  with  respect  to 
satisfying  the  different  requirements,  a  cost  function  was 
developed.  Meeting  each  design  requirement  has  a  cost.,  and  each 
performance  measure  of  a  design  has  a  cost.  Table  7  summarizes 
the  design  requirements  considered,  and  Table  1 1,  Table  12,  and 
Table  13  summarize  the  solutions  and  possible  design  properties. 


Table  7 


Requirements 


Weight 


Fabrication 


Maintenance 


Stress 


Displacement 


Table  8:  Fabrication  Costs 


Circular 

I-Section 

Channel 

Roller 

2 

1 

2 

Pin 

2 

1 

1 

Fixed 

1 

1.5 

1.5 

Fixed 


Pin 


Roller 


(2)  (1.5)  (1) 

Figure  40:  Maintenance  Costs  for 
Support  Types 


The  cost  of  meeting  the  maintenance 
requirement  is  a  function  of  the  type  and 
number  of  supports  the  solution  uses.  The  basic 
maintenance  cost  for  each  type  of  support  is 
shown  in  Figure  40.  The  costs  of  meeting  the 
fabrication  requirement  is  based  on  the  type  of  cross  section  and  supports  a  solution 
uses.  Table  8  shows  the  fabrication  costs  for  each  of  the  different  cross  sections  and 
support  type  matches.  In  Appendix  A,  representative  quantities  were  given  for  each  of 
the  cross  sections  in  Table  12.  As  a  result,  weights,  displacements,  and  stresses  could 
be  calculated  for  each  beam  solution  (see  Appendix  A).  The  cost  of  meeting  the 
weight  requirement  is  a  function  of  the  cross  section  type  and  relative  quantity  of 
material  (small,  medium  small,  medium,  medium  large,  and  large).  Table  9  shows  the 
basic  weight  costs  for  each  cross  section.  The  basic  weight  cost  values  were  derived 
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from  the  cross  section  area  normalized  with  respect  to  the  smallest  cross  section  area. 

Cross  sections  that  do  not  use  much  material  have  a  lower  cost  than  those  using  more 

material.  The  cost  for  meeting  the  displacement  requirement  is  based  on  the  maximum 

magnitude  displacement  for  each  solution.  These  displacements  are  then  normalized 

with  respect  to  the  minimum 

displacement  of  all  solutions. 

The  cost  for  meeting  the 

stress  requirement  is  based 

on  the  maximum  flexural 

stress  in  each  solution. 

These  stresses  are 

normalized  with  respect  to 

the  minimum  stress 

occurring  in  the  complete  set 

of  solutions. 

The  relevant  costs  associated  with  a  section  type  and  beam  solution  are  added 
together  to  form  a  cost  vector  of  five  solutions  for  each  section  type.  For  example,  if 
we  wished  to  design  a  simple  beam  using  a  circular  cross  section  while  accounting  for 
low  maintenance  and  low  weight  requirements,  the  cost  is  determined  as  follows: 

Given:  5  circular  cross  section  types  and  simple  beam. 

MaintenanceCost  =  1.5  +  1.0  (from  Figure  40,  1  pin  and  1  roller) 


Table  9:  Basic  Weight  Costs 

Section  Type 

Relative  Size 

Weight  Cost 
Value 

Circular 

Small 

1.53 

Circular 

Medium  Small 

4.45 

Circular 

Medium 

5.85 

Circular 

Medium  Large 

10.86 

Circular 

Large 

12.96 

I- Section 

Small 

1.11 

I-Section 

Medium  Small 

1.78 

I-Section 

Medium 

3.78 

I-Section 

Medium  Large 

6.89 

I-Section 

Large 

8 

Channel 

Small 

1 

Channel 

Medium  Small 

2.67 

Channel 

Medium 

5.33 

Channel 

Medium  Large 

7.11 

Channel 

Large 

9.33 
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WeightCost  = 


1.53 

4.45 

5.85 

10.86 

12.96 


(from  Table  9) 


CircularSectionCost  - 


'  1.53 

"2.5' 

"  4.03 " 

4.45 

2.5 

6.95 

5.85 

+ 

2.5 

= 

8.35 

10.86 

2.5 

13.36 

12.96 

2.5 

15.46 

Similar  calculation  are  made  for  each  section  type.  As  a  result,  there  are  1 5  possible 
simple  beam  solutions  when  considering  all  three  cross  section  types.  When  all  four 
beam  solution  types  are  taken  into  account  there  are  60  resulting  designs  for  the 
requirements  of  low  maintenance  and  low  weight.  The  costs  for  each  beam  type  are 
aggregated  into  a  cost  vector  with  the  result  being  four  cost  vectors  of  1 5  elements, 
each  vector  representing  a  beam  solution.  Cost  vectors  can  been  arranged  such  that 
cost  contours  may  be  plotted.  Examples  of  cost  contour  plots  are  given  in  Appendix 
A. 

Table  10:  Fabrication  Costs 


Circular 

I-Section 

Channel 

Roller 

2 

1 

2 

Pin 

2 

1 

1 

Fixed 

1 

1.5 

1.5 

Problem  Definition  and  Setup 

This  design  problem  is  first  examined  with  respect  to  setting  up  the  network 
for  each  experiment.  A  total  of  eight  experiments  are  run  to  test  the  network's  ability 
to  learn  and  generalize.  Although  this  design  problem  is  small,  it  is  representative  of 
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many  preliminary  design  problems,  and  unlike  most  preliminary  design  problems,  the 
solution  space  is  fully  explored  within  the  context  of  its  definition. 

The  objective  is  to  select  a  design  solution  class  that  is  represented  by  a  type  of 
beam  with  various  end  supports  and  cross-section  characteristics.  Considering  all 
possible  combinations  of  solutions,  there  are  60  possible  solutions  for  any  set  of  design 
requirements.  Each  solution  may  be  identified  by  the  types  of  end  support,  the  type  of 
cross-section,  the  moment  of  inertia  of  the  cross-section,  and  the  resulting  cost  of  the 
beam. 

The  number  and  type  of  input  neurons  are  directly  a  function  of  the  design 
requirements.  Five,  binary  input  neurons  will  be  used  to  represent  each  design 
requirement  shown  in  Table  7.  A  value  of  one  indicates  that  the  corresponding  design 
requirement  is  essential,  and  a  value  of  zero  implies  that  the  design  requirement  is  not 
essential. 

The  number  and  type  of  output  Table  1 1 :  Beam  Solution  Classes 


Description 
neurons  are  identified  by  examining  the  j  Simply  Supported  Beam 

Cantilever  Beam 
required  output.  There  are  several  possible  Propped  Beam 


Fixed  End  Beam 


Index 
1 

2 

3 

4 


i 


ways  of  representing  the  supports.  The 
chosen  method  is  efficient  in  term  of  limiting  the  number  neurons  without  penalizing 
expressiveness.  Each  class  of  beam  is  represented  by  an  integer  value  as  shown  in 
Table  11. 

There  are  three  types  of  cross  sections  that  can  be  chosen,  either  circular,  I ,  or 
channel  shapes.  Each  of  these  are  given  an  index  from  one  to  three,  respectively,  as 
shown  in  Table  12.  There  are  five  different  selections  for  moments  of  inertia  for  each 
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rable  12:  Cross  Sections 

Description 

Index 

Circular 

1 

I 

2 

Channel 

3 

section.  These  are  given  indices  from  one  to  five  as  shown  in  Table  13.  There  is  an 
obvious  link  between  the  type  of  cross-section  and  its  corresponding  moment  of 
inertia.  Considering  that  we  are  concerned  with  preliminary  design,  the  exact  values  of 
moments  of  inertia  are  immaterial,  but  their  relative  size  is  significant  when  we  apply 
qualitative  terms  to  each  index. 

There  are  four  possible  output  neurons  that  have  either  stepped  integers  within 
various  bounds  or  continuous  valued  output.  The 
possible  output  neurons  represent,  beam  class,  cross 
section  type,  size  of  the  cross  section,  and  beam  cost. 
In  all  cases,  neuron  output  will  be  represented  as 
continuous  valued  outputs  with  results  being 
rounded  to  the  nearest  integer  where  appropriate. 
Output  neuron  one  will  represent  the  solution  class 
with  a  valid  range  from  0.5  to  4.5  (Table  1 1). 
Output  neuron  two  will  represent  the  type  of  cross- 
section  with  a  valid  range  from  0.5  to  3.5  (Table  12).  Output  neuron  three  will 
represent  the  qualitative  size  of  the  moment  of  inertia  of  the  beam  with  a  valid  range 
from  0.5  to  5.5  (Table  13).  Output  neuron  four  will  represent  the  cost.  From  the 
design  example  problem  definition,  the  valid  range  is  from  0.5  to  17.5. 

The  training  set,  representing  "good"  designs,  was  drawn  from  the  pool  of 
design  candidates.  For  each  set  of  requirements,  there  are  60  possible  solutions.  With 
five  different  design  requirements,  there  are  31  unique  combinations  of  design 
requirements,  resulting  in  1,860  possible  designs.  Considering  only  the  minimum  cost 


Table  13:  Moments  of 
Inertia 


Description 

Index 

Small 

1 

Medium-Small 

2 

Medium 

3 

Medium-Large 

4 

Large 

5 
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solution  for  each  of  the  1,860  possible  designs,  there  are  31  minimum  cost  designs 
from  the  pool  of  1,860.  From  this  pool  of  31  minimum  cost  designs,  a  total  of  twelve 
different  solutions  were  identified.  In  those  cases  where  more  than  one  good  design 
solution  exists,  the  one  that  was  least  represented  in  the  solution  set  was  chosen  to 
give  more  variety.  In  several  cases,  there  existed  solutions  that  had  only  slightly 
different  costs  but  the  lower  was  always  used. 

A  single  hidden  layer  is  used.  The  number  of  neurons  in  this  layer  will  initially 
be  set  using  equation  (1).  Since  the  number  of  hidden  nodes  and  training  pairs  will 
effect  the  performance  of  the  network,  a  study  of  the  training  set  size  and  number  of 
hidden  node's  effect  on  generalization  will  be  done  by  varying  the  number  of  hidden 
nodes  in  several  of  the  subsequent  experiments.  Table  14  details  the  network 
parameters  used  in  most  cases.  Several  parameters  have  two  terms  separated  by  a 
slash.  The  first  term  corresponds  to  the  value  used  with  the  Quickprop  algorithm  and 
the  second  term  is  used  with  the  SuperSAB  algorithm.  The  values  in  Table  14  were 
arrived  at  from  experience  using  the  different  network  paradigms  on  test  problems 
during  the  development  of  the  QuikProp  neural  network  simulator. 

In  general,  the  Quickprop  method  is  used  whenever  possible  since  it  appears 
more  stable.  On  problems  where  Quickprop  fails  to  converge,  SuperSAB  is  tried. 
When  neither  method  converges,  those  cases  are  identified.  Where  different 
parameters  are  used,  each  experiment  identifies  these  cases. 

According  to  Baum's  parameters  with  a  generalization  error,  s,  of  0. 125 
(equations  (3)  and  (4)),  this  network,  as  configured,  would  require  600  training  pairs. 
This  design  problem  only  has  3 1  possible  unique  inputs  of  binary  data.  The 
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discrepancy  is  attributed  to  the  lack  of  continuous  valued  input  and  output  neurons, 
and  Baum's  rule  does  not  directly  apply  in  this  case.  Because  this  problem  is  small,  we 
will  able  to  perform  a  series  of  experiments  that  test  generalization  capabilities  with 
respect  to  network  size. 

Table  14:  General  Network  Parameters 


Initial  Learning  Rate,  r\ 

0.02/0.6 

Momentum,  a 

0.9/0.1 

Weight  Initialization  Range 

-0.3  <w  <0.3 

Maximum  Activation 

1.0 

Error  Function 

atanhQ 

Sigmoid  Prime  Shift 

0.1 

Learning  Dynamics 

Quickprop  /  SuperSAB 

Maximum  Growth  Factor 

1.75 

Maximum  Learning  Rate 

5.0 

Weight  Decay,  x 

0.001 

Measuring  Network  Performance 

Each  network  is  trained  until  the  pattern  error,  Ep,  for  each  pattern,  p,  is  less 
than  a  set  pattern  tolerance  value  and  the  system  error,  E,  is  less  than  a  set  system 
tolerance  value.  The  pattern  error  is  defined  as 

1 


£,  =  t2>,-*,)3 


(5) 


where,  there  are  7  neurons  in  the  output  layer;  /,  are  the  target  output  activation  values, 
and  Oj  are  the  calculated  output  activation  values.  The  system  error  is  the  average 
pattern  error  for  all  training  pairs. 

E  =  ZEP  (6) 
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Learning  performance  of  the  networks  is  measured  using  the  recall  error  measure, 

which  is  the  number  of  incorrect  patterns  recalled  from  the  entire  training  set  divided 

by  the  total  number  of  training  patterns/?. 

NumberlncorrectlyRecalled 

RecallError  = (7) 

P 

Generalization  performance  is  measured  using  the  generalization  error,  which  is  the 

number  of  test  patterns  incorrectly  recalled  divided  by  the  number  of  test  patterns. 

NumberlncorrectlyRecalled 

GeneralizationError  = — - — — — (8) 

Number  TestPatterns 

Recall-Test  1 

This  test  examines  the  ability  of  the  network  to  learn  all  3 1  training  cases.  The 

networks  have  5  input  neurons  representing  the  design  requirements  and  3  output 
neurons  representing  a  design  solution's  features.  The  cost  output  neuron  is  not 
considered  in  this  test.  The  goal  is  to  determine  the  effect  of  the  number  of  hidden 
nodes  on  the  ability  to  learn.  Equation  (1)  indicates  that  4  hidden  neurons  should  be 
used.  With  3 1  training  cases  and  3  output  neurons,  there  are  93  unknown  values. 
Using  more  than  6  hidden  nodes  will  create  an  underdetermined  problem.  This  test 
starts  with  a  network  with  4  hidden  neurons.  All  the  networks  in  this  test  are  trained 
for  a  maximum  of  50,000  epochs  or  until  convergence  is  achieved.  Convergence  is 
defined  by  a  system  error  of  0.05  and  a  pattern  error  of  0. 1 .  The  results  are  shown  in 
Table  15. 

The  recall  error  indicates  how  well  the  network  can  exactly  recall  any  given 
learned  pattern.  Smaller  numbers  indicate  better  recall  characteristics.  There  are  no 
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additional  test  pairs  that  can  be  used  to  check  generalization  errors.  This  is  done  in 
subsequent  tests. 


Table  1 5 :  Results  for  Recall  Test  1 

Method 

Hidden  Neurons 

Weights 

Epochs 

Recall  Error 

Quickprop 

4 

60 

50000 

0.23 

Quickprop 

5 

75 

50000 

0.16 

Quickprop 

5 

75 

10632 

0.06 

Quickprop 

5 

75 

50000 

0.10 

Quickprop 

5 

75 

50000 

0.10 

Quickprop 

6 

90 

26391 

0.03 

Quickprop 

6 

90 

33673 

0.06 

Quickprop 

6 

90 

60000 

0.52 

Quickprop 

6-8 

90-120 

68121 

0.03 

Quickprop 

8 

120 

9253 

0.06 

Quickprop 

10 

150 

5167 

0.06 

Quickprop 

15 

225 

9738 

0.06 

Overdetermined  problems  will  generally  not  exhibit  perfect  recall  and 
underdetermined  problems  will  either  have  no  solution  or  more  than  one  solution.  The 
number  of  epochs  required  to  train  the  network  is  indicative  of  the  training  difficulty, 
the  saturation  of  the  connections,  and  the  quality  of  the  starting  point  in  weight  space. 
Using  too  few  hidden  neurons  has  the  danger  of  saturating  the  weight  values.  A 
network  that  cannot  learn  given  examples  typically  has  too  few  hidden  neurons.  A 
network  that  easily  learns  the  examples  might  be  too  large  and  might  not  generalize 
well.  The  network  could  not  properly  train  with  4  hidden  neurons.1  One  of  the 
networks  using  6  hidden  neurons  has  the  lowest  recall  error,  but  another  6  hidden 
neuron  network  did  not  converge  after  60,000  epochs  and  as  a  result  could  not  recall 
52%  of  the  training  patterns.  The  poorly  performing  6  hidden  neuron  network  was 


Neither  Quickprop  nor  SuperSAB  the  four  hidden  node  problem  within  50000 
epochs. 
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stopped  while  the  weights  were  still  undergoing  significant  changes.  The  6  hidden 
neuron  network  that  performs  well  has  simply  converged  to  the  best  local  minimum. 
Better  solution  points  still  may  exist  in  this  overdetermined  problem. 

In  an  effort  to  further  reduce  the  recall  error,  the  6  hidden  neuron  network  that 
had  the  lowest  recall  error  was  modified  to  have  8  hidden  neurons.  This  was  done  by 
extending  the  matrix  of  weights  for  the  6  hidden  neuron  network  with  very  small 
weight  values  such  that  the  recall  would  not  change.  The  pattern  tolerance  was 
tightened  to  0.06,  and  the  system  tolerance  was  tightened  to  0.02.  This  network  was 
then  restarted  and  met  the  tightened  tolerances  after  68,121  epochs.  The  single  pattern 
that  could  not  be  recalled  in  the  6  hidden  neuron  network  could  also  not  be  recalled  in 
the  expanded  network.  The  design  requirements  for  this  case  called  for  low 
maintenance,  weight,  displacements,  and  stresses.  The  desired  solution  was  for  a  fixed 
end  beam,  a  channel  cross  section,  and  a  large  moment  of  inertia.  The  cost  of  this 
solution  was  15.4.  The  recalled  pattern  from  both  networks  was  for  a  propped 
cantilever  beam,  a  channel  cross  section,  and  a  large  moment  of  inertia.  The  cost  for 
this  solution  was  15.7.  Both  recalled  solutions  satisfy  the  design  requirements. 
Although  the  propped  cantilever  solution's  cost  is  close  to  the  desired  solution's  cost, 
the  network  has  not  perfectly  learned  the  desired  mapping  from  requirements  to  design 
features. 

The  networks  using  more  than  6  hidden  neurons  were  solving  an 
underdetermined  problem  where  multiple  solutions  to  the  weights  could  exist.  Multiple 
solutions  would  make  convergence  easier  since  any  one  random  starting  point  could 
be  close  to  a  solution.  This  appears  true  since  the  networks  using  8,  10,  and  15  hidden 
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neurons  did  not  have  trouble  converging  to  solutions.  The  network's  ability  to  learn  is 
obviously  facilitated  by  increasing  the  number  of  hidden  neurons;  however,  the  number 
of  hidden  neurons'  effect  on  the  generalization  error  remains  to  be  evaluated.  The 
generalization  error  will  be  examined  in  the  next  several  tests. 

Generalization— Test  2 

This  test  examines  a  network's  ability  to  generalize.  A  broad  spectrum  of  the 
training  pairs  is  chosen.  Of  the  3 1  possible  training  pairs,  7  training  pairs  are  retained 
to  test  generalization  with  24  training  pairs  used  to  train  the  network.  The  same 
number  of  network  input  and  output  neurons  are  used  in  this  test  as  those  for  test  1. 
Using  24  training  pairs  results  in  72  known  values.  Using  4  hidden  neurons  results  in 
an  overdetermined  problem  and  using  5  or  more  hidden  neurons  results  in 
underdetermined  problems.  Four  different  runs  are  made  using  5  hidden  neurons,  and 
two  runs  are  done  with  6  hidden  neurons.  A  maximum  of  50,000  epochs  are 
performed  with  a  pattern  tolerance  of  0.1  and  a  system  tolerance  of  0.05.  The  learning 
dynamics  use  both  Quickprop  and  SuperSAB  in  these  experiments.  The  results  are 
shown  in  Table  16. 

This  example  brings  up  several  important  points  concerning  not  only  the 
performance  of  the  networks,  but  also  issues  regarding  interpretation  of  results  with 
respect  to  the  problem  domain.  From  test  1  it  seems  apparent  that  the  network  will  not 
train  using  5  hidden  neurons;  however,  these  networks  did  train  because  removing 
seven  training  pairs  from  the  complete  set  of  3 1  examples  has  two  consequences.  First, 
it  reduces  the  amount  of  information  that  must  be  learned,  reducing  the  saturation 
level.  Second,  and  perhaps  more  importantly  in  the  design  domain,  it  has  the  potential 
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to  eliminate  conflicting  examples  that  would  contribute  to  a  more  difficult  error 
surface.  Reducing  the  amount  of  information  that  the  network  must  learn  helps  reduce 
the  potential  for  weight  saturation.  When  weight  saturation  occurs,  small  changes  in 
the  weights  cause  significant  changes  in  neuron  response;  thus,  the  network  cannot 
learn  the  required  information. 


Tab 

e  16:  Results  for  Generalization  Test  2 

Method 

Hidden 
Neurons 

Weights 

Epochs 

Recall  Error 

Generalization 
Error 

Quickprop 

4 

60 

44659 

0.13 

0.86 

Quickprop 

4 

60 

50000 

0.00 

0.86 

Quickprop 

4 

60 

50000 

0.08 

0.57 

Quickprop 

5 

75 

34404 

0.08 

0.29 

SuperSAB 

5 

75 

2223 

0.13 

0.43 

Quickprop 

5 

75 

39228 

0.08 

0.14 

SuperSAB 

5 

75 

3698 

0.13 

0.00 

Quickprop 

6 

90 

18435 

0.13 

0.29 

SuperSAB 

6 

90 

50000 

0.04 

0.29 

Based  on  the  drastically  faster  learning  times  of  SuperSAB  in  this  test, 
SuperSAB  has  more  difficulty  with  problems  that  have  very  steep  gradients,  which  are 
characteristic  of  difficult  problems,  but  SuperSAB  is  significantly  faster  than  even 
Quickprop  for  easier  problems.  The  importance  of  the  second  consequence  is 
measured  by  the  generalization  error.  Low  generalization  errors  may  intimate  that  the 
network  has  identified  an  underlying  mapping  from  the  input  vector  space  of  design 
requirements  and  to  an  output  vector  space  of  design  alternatives. 

Interpretation  of  the  generalization  error  requires  some  analysis.  The  complete 
space  of  design  alternatives  consists  of  1,860  possible  designs  for  3 1  combinations  of 
design  requirements.  For  each  set  of  design  requirements,  there  are  60  possible  design 
configurations.  Each  design  is  characterized  by  a  resulting  cost  that  incorporates  each 
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considered  design  requirement.  Thus,  for  each  set  of  design  requirements,  the  60 
possible  design  configurations  may  be  ranked  according  to  cost.  The  lowest  cost 
design  is  the  "ideal"  design.  Preliminary  design  is  not  characterized  by  a  single  ideal 
design  but  a  collection  of  feasible  designs,  where  each  design  satisfies  the  given  design 
requirements.  By  satisfying  requirements,  we  are  not  violating  requirements.  Each  set 
of  60  designs  for  a  set  of  given  requirements  is  feasible,  but  the  costs  of  these  60 
designs  can  range  widely  and  even  include  identical  and  nearly  identical  costing 
artifacts.  Therefore,  interpretation  of  the  generalization  error  is  in  order  and  must  be 
done  with  respect  to  the  preliminary  design  domain  since  the  mapping  should  not 
identify  single  "ideal"  designs  as  is  the  case  for  recall  operations. 

In  the  preliminary  design  domain,  network  performance  should  go  beyond 
simple  recall.  Artificial  neural  networks  for  preliminary  design  need  to  identify 
promising  feasible  preliminary  designs;  therefore,  the  generalization  error  is  with 
respect  to  good  feasible  designs,  which  is  a  subjective  measurement.  For  this  problem, 
a  good  feasible  design  is  arbitrarily  defined  to  be  one  that  is  within  15%  of  the  cost  of 
the  expected  test  case. 

The  generalization  errors  for  this  series  of  examples  ranges  from  a  poorly 
performing  network  with  a  generalization  error  of  86%  to  a  perfectly  performing 
network  (generalization  error  of  0%).  The  learning  dynamics  employed  do  not  have 
any  influence  on  the  accuracy  of  recall  or  generalization,  providing  both  the  networks 
converge.  One  of  the  SuperSAB  runs,  using  6  hidden  neurons,  did  not  converge  but 
still  performed  well.  The  network  in  this  case  was  very  close  to  converging  but  had  not 
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technically  met  convergence  criteria.  This  is  also  true  for  the  second  4  hidden  neuron 
network. 

More  consistent  generalization  errors  are  desirable,  but  this  also  emphasizes 
the  need  to  withhold  some  training  data  for  use  in  testing  network  accuracy.  Without 
some  means  for  testing,  there  are  no  assurances  that  the  network  will  perform  to 
expectations.  A  consistently,  poorly  performing  network  may  be  indicative  of  poor 
problem  choice,  formulation,  or  training  criteria.  Even  though  several  of  the  networks 
had  low  recall  errors  (less  than  10%),  the  generalization  errors  still  could  be  quite 
high.  Where  the  generalization  error  is  high,  the  network  has  not  learned  the  desired 
mapping  from  design  requirements  to  design  features  for  the  set  of  test  cases.  Without 
the  ability  to  perform  generalization  tests,  confidence  in  neural  networks  for  design 
should  be  low.  The  next  test  will  examine  the  effect  of  using  a  larger  than  suggested 
number  of  hidden  neurons. 

Generalization— Test  3 

This  series  of  runs  uses  a  larger  number  of  hidden  neurons  to  test  the 
generalization  abilities  of  neural  networks  based  on  higher  order  underdetermined 
problems.  A  series  of  runs  are  made  using  8  and  10  hidden  neurons  with  both 
Quickprop  and  SuperSAB  to  test  the  accuracy  of  generalization.  The  same  seven  test 
pairs  are  withheld  from  the  training  set  in  order  to  test  generalization.  A  maximum  of 
50,000  epochs  are  performed  with  a  pattern  tolerance  of  0. 1  and  a  system  tolerance  of 
0.05.  The  learning  dynamics  use  both  Quickprop  and  SuperSAB  in  these  experiments. 
The  results  are  shown  in  Table  17. 
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Table  17:  Results  from  Generalization  Test  3 


Method 

Hidden 
Neurons 

Weights 

Epochs 

Recall  Error 

Generalization 
Error 

Quickprop 

8 

120 

12000 

0.08 

0.00 

SuperSAB 

8 

120 

270 

0.08 

0.14 

Quickprop 

8 

120 

13359 

0.04 

0.29 

SuperSAB 

8 

120 

2014 

0.17 

0.14 

Quickprop 

10 

150 

9959 

0.08 

0.29 

SuperSAB 

10 

150 

2507 

0.08 

0.14 

From  these  results  and  the  previous  test  results,  this  design  problem's 
generalization  characteristics  do  not  change  much  between  using  5,  6,  8,  or  10  hidden 
nodes.  This  corresponds  to  the  results  of  Morgan  [Morgan90]  that  testify  to  a  range  of 
hidden  layer  sizes  produce  acceptable  generalization  errors.  Underdetermined 
problems  can  have  multiple  solutions  (or  no  solution),  but  the  network's  ability  to 
generalize  in  these  cases  is  only  measurable  from  the  test  data.  With  an 
underdetermined  problem,  there  is  a  chance  that  the  network  has  performed  an  overfit 
of  the  training  data.  In  overdetermined  problems,  an  exact  solution  is  unlikely,  so  the 
network  is  more  likely  to  approximate  the  data.  Regardless,  a  poor  approximation  of 
the  data  is  not  necessarily  better  than  overfrtting  the  data.  The  next  set  of  tests 
examine  the  affects  of  slightly  reformulating  the  problem  by  including  a  continuous 
valued  output  neuron  representing  a  cost  estimate. 


Recall— Test  4 

This  test  examines  the  recall  and  learning  capabilities  of  a  network 
configuration  that  is  essentially  the  same  as  the  previous  three  tests,  but  it  includes  an 
additional  output  neuron  that  estimates  the  cost.  This  additional  output  neuron  will 
represent  a  continuous  valued  output  with  a  range  from  0  to  17.  This  test  explores  the 
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network  size  issue  with  respect  to  learning  and  recall  capabilities.  All  3 1  training  pairs 
are  presented  to  networks  with  various  size  hidden  layers.  A  maximum  of  50,000 
epochs  are  performed  with  a  pattern  tolerance  of  0. 1  and  a  system  tolerance  of  0.05. 
The  results  are  shown  in  Table  18. 

The  recall  error  includes  two  values.  The  numbers  in  the  parenthesis  indicate 
the  recall  error  including  the  cost  estimate.  The  cost  estimate  was  not  expected  to  be 
exact  but  within  15%  of  the  exact  value.  The  recall  error  values  that  are  not  within  the 
parenthesis  indicate  the  recall  error  without  considering  the  cost  estimate.  The  cost 
estimate  may  be  ignored  as  a  reliable  recall  value  but  is  still  useful  in  learning  as  a 
further  discriminator  between  solutions.  With  the  cost  output  as  a  further 
discriminator,  the  networks  in  general  exhibit  a  lower  recall  error  than  those  from 


recall  test  1 . 


Table  18:  Results  for  Recall  Test  4 


Method 

Hidden  Nodes 

Weights 

Epoch 

Recall  Error 

Quickprop 

4 

80 

50000 

0.19(0.45) 

Quickprop 

5 

100 

50000 

0.06  (0.29) 

Quickprop 

6 

120 

50000 

0.13  (0.35) 

Quickprop 

8 

160 

14753 

0.00(0.19) 

Quickprop 

10 

200 

9844 

0.00(0.12) 

Quickprop 

15 

300 

8811 

0.03  (0.12) 

These  results  illustrate  a  two  important  points  concerning  difficult  problems. 
First,  additional  neurons  that  further  discriminate  solution  sets  are  useful  in  defining 
the  solution  space.  They  provide  additional  error  signals  and  weights  that  allow  for 
further  discrimination.  The  second  point  is  that  weights  can  quickly  become  saturated 
by  expanding  the  number  of  input  or  output  neurons.  Oftentimes,  increasing  the  size  of 
either  the  input  or  output  layer  requires  changing  the  size  of  the  hidden  layers.  The 


221 

number  of  unknown  values  increases  by  the  addition  of  new  input  and  output  neurons. 
The  next  series  of  tests  examine  the  generalization  abilities  of  this  network  and  the 
effects  of  the  cost  discriminator  neuron  on  generalization. 

Generalization— Test  5 

This  set  of  runs  examines  the  effectiveness  of  using  5  and  6  hidden  neurons  on 
the  generalization  error.  Table  19  displays  the  results  of  this  series  of  generalization 
tests.  These  networks  had  problems  with  convergence.  With  4  output  neurons  and  24 
training  pairs,  there  were  96  known  values.  The  networks  with  5  hidden  neurons  were 
overdetermined  and  the  networks  with  6  hidden  neurons  were  underdetermined.  A 
maximum  of  50,000  epochs  are  performed  with  a  pattern  tolerance  of  0. 1  and  a  system 
tolerance  of  0.05.  Only  one  network  converged.  Recall  errors  are  large  in  these 
networks  due  to  the  inability  of  the  networks  to  approximate  the  design  costs  to  within 
15%  of  the  correct  value.  Networks  that  have  not  converged  may  still  perform  well, 
although  unreliably.  Even  networks  that  achieve  convergence  may  not  generalize  well 
since  they  may  just  be  performing  a  simple  lookup  due  to  the  network  size. 
Table  19:  Results  from  Generalization  Test  5 


Method 

Hidden 
Neurons 

Weights 

Epochs 

Recall  Error 

Generalization 
Error 

Quickprop 

5 

80 

50000 

0.33 

0.14 

SuperSAB 

5 

80 

50000 

0.58 

0.14 

Quickprop 

6 

100 

41732 

0.33 

0.14 

SuperSAB 

6 

100 

50000 

0.79 

0.28 

Generalization— Test  6 

This  last  set  of  networks  demonstrate  the  generalization  potential  of  networks 
configured  with  either  8  or  10  hidden  neurons  for  this  design  problem.  A  maximum  of 
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50,000  epochs  are  performed  with  a  pattern  tolerance  of  0. 1  and  a  system  tolerance  of 
0.05.  As  Table  20  reveals,  all  networks  converged,  and  good  generalization  properties 
appear  for  networks  with  either  6  (from  Test  5)  or  8  hidden  neurons.  Most  networks 
could  recall  the  proper  design  features,  but  the  recall  errors  are  high  in  this  test  due  to 
the  inability  of  the  networks  to  properly  approximate  the  resulting  design  costs. 


Tab 

e  20:  Results  for  Generalization  Test  6 

Method 

Hidden 
Neurons 

Weights 

Epochs 

Recall  Error 

Generalization 
Error 

Quickprop 

8 

160 

8749 

0.25 

0.14 

SuperSAB 

8 

160 

2581 

0.42 

0.14 

Quickprop 

10 

200 

19685 

0.21 

0.29 

SuperSAB 

10 

200 

5168 

0.25 

0.43 

Evaluation 

Backpropagation  type  networks  have  the  ability  to  learn  sets  of  training  pairs 
and  demonstrate  good  generalization;  however,  they  do  not  do  so  in  a  consistent 
manner.  Good  recall  and  generalization  capabilities  appeared  for  both  overdetermined 
and  underdetermined  network  configurations,  but  poor  results  also  occur.  This 
phenomenon  is  analogous  to  curve  fitting  problems.  An  apparent  good  approximation 
to  a  curve  in  a  region  is  no  better  than  results  obtained  from  a  curve  that  could  be 
overfit  in  the  same  region.  The  current  standard  procedure  for  measuring  quality  of 
learning  is  the  generalization  error  from  a  statistically  representative  set  of  known 
input-output  mappings  that  are  not  part  of  the  training  set. 

The  SuperSAB  algorithm  is  very  sensitive  to  the  characteristics  of  weight 
space  as  defined  by  the  network  topology  and  characteristics  and  size  of  the  training 
set.  SuperSAB  is  much  faster  than  Quickprop  on  those  problems  where  it  did 
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converge;  however,  for  the  recall  problems  where  3 1  training  pairs  are  presented  to 
the  network,  SuperSAB  could  not  effectively  deal  with  the  highly  nonlinear  and 
conflicting  nature  of  the  training  data.  The  network  parameters  for  the  SuperSAB  runs 
are  conservative  as  are  those  for  Quickprop.  SuperSAB  generally  did  not  have 
problems  converging  on  the  generalization  tests,  where  the  networks  trained  with  24 
sets  of  examples  and  7  example  sets  were  withheld  for  testing. 

These  networks  were  formulated  using  stepped  integers  as  outputs  from 
neurons  that  are  continuous.  For  this  problem,  one  output  was  stepped  from  1  to  5, 
another  from  1  to  4,  and  another  from  1  to  3.  Different  output  neurons  would  then 
vary  their  error  single  with  respect  to  these  different  ranges.  Small  fluctuations  or 
noise  could  then  adversely  effect  the  error  signal.  Stepped  integer  output  may  be 
efficient  in  terms  of  limiting  the  number  of  neurons  in  the  system,  but  it  may  adversely 
effect  performance.  Design  problems  are  very  likely  to  have  stepped  integers  that 
represent  abstract  features.  Using  purely  continuous  variables  leads  to  a  false  sense  of 
precision.  Artificial  neural  networks  do  not  perform  well  generating  exact  answers  as 
demonstrated  by  prediction  of  design  costs.  Employing  real  numbers  as  output 
requires  a  great  deal  of  forethought  pertaining  to  their  meaning  and  interpretation.  The 
next  design  example  investigates  using  binary  output  neurons  rather  than  continuous 
valued  neurons  interpreted  as  stepped  integers  to  solve  this  same  beam  design 
problem. 

Binary  Beam  Design  Example 
This  design  problem  is  identical  to  the  previous  beam  design  problem,  but  a 

different  network  formulation  is  used.  Instead  of  using  stepped  integers  for  some  of 
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the  output  neurons,  binary  output  neurons  are  used.  In  addition,  the  continuous  valued 
output  neuron  that  represents  cost  is  not  included.  The  first  set  of  output  neurons  will 
represent  the  solution  class,  which  has  a  range  from  1  to  4.  The  first  two  binary 
neurons  are  in  this  set.  The  second  set  of  output  neurons  represent  the  type  of  cross- 
section,  which  has  a  range  from  1  to  3.  The  next  two  binary  output  neurons  are  in  this 
set.  The  third  set  of  binary  output  neurons  represent  the  qualitative  size  of  the  moment 
of  inertia  of  the  beam,  which  has  a  range  from  1  to  5.  The  last  three  binary  output 
neurons  are  in  this  set. 

The  binary  character  of  the  output  neurons  allows  for  three  sets  of  undefined 
neuron  outputs.  There  are  no  limitations  on  the  network  for  activating  these 
combinations.  There  are  127  possible  output  combinations  with  only  31  defined  in  the 
solution  space.  There  is  nothing  other  than  rigorous  training  to  prevent  the  network 
from  activating  these  other  combinations.  Tight  error  tolerances  could  prevent  these 
combinations  from  being  recalled;  however,  this  could  also  cause  the  network  to 
overtrain  and  be  useful  for  only  recall  operations. 

The  first  test  inspects  the  recall  characteristics  of  this  binary  neural  network 
with  respect  to  the  number  of  hidden  neurons.  Subsequent  tests  build  upon  these 
results  to  explore  the  generalization  error  of  this  formulation.  An  evaluation  of  overall 
performance  follows  the  results  from  this  series  of  tests. 

Recall-Test  1 

This  test  examines  the  recall  and  learning  capabilities  of  solely  binary  networks, 

one  in  which  all  input  and  output  neurons  can  take  on  one  of  two  possible  states.  This 

test  explores  the  network  size  issue  with  respect  to  learning  and  recall  capabilities.  All 
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3 1  training  pairs  are  presented  to  networks  with  various  size  hidden  layers.  A 
maximum  of  50,000  epochs  are  performed  with  a  pattern  tolerance  of  0. 1  and  a  system 
tolerance  of  0.05.  The  results  are  shown  in  Table  21. 


Table  21 

:  Results  from 

lecall  Test  1 

Method 

Hidden  Neurons 

Weights 

Epochs 

Recall  Error 

Quickprop 

4 

140 

50000 

0.00 

Quickprop 

5 

175 

50000 

0.00 

Quickprop 

6 

210 

13022 

0.00 

Quickprop 

8 

280 

8850 

0.00 

Quickprop 

10 

350 

5832 

0.00 

An  interesting  observation  from  the  results  of  this  test  are  that  the  recall  error 
is  zero  for  all  tests,  including  those  that  did  not  converge.  This  fact  raises  the  question 
as  to  the  magnitudes  of  the  specified  error  tolerances,  which  were  0.05  for  the  system 
tolerance  and  0. 10  for  the  pattern  tolerance.  If  these  tolerances  are  too  tight,  then  the 
system  will  overtrain  and  not  perform  well  on  generalization  tests.  Judging  from  the 
exceptional  recall  errors,  the  RMS  error  may  not  be  the  best  measure  for  binary 
neuron  problems  such  as  this.  The  next  test  directly  examines  the  generalization  error 
using  the  same  network  configuration  with  binary  output  neurons. 

Generalization— Test  2 

This  test  examines  the  generalization  capabilities  of  the  binary  networks 
presented  in  the  previous  test.  This  test  explores  the  network  size  issue  with  respect  to 
generalization  capabilities.  Of  the  31  training  pairs  available,  7  are  not  presented 
during  training  and  are  used  to  test  the  generalization  capabilities.  A  maximum  of 
50,000  epochs  are  performed  with  a  pattern  tolerance  of  0. 1  and  a  system  tolerance  of 
0.05.  The  results  are  shown  in  Table  22. 
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Tab 

e  22:  Results  for  Generalization  Test  2 

Method 

Hidden 

Neurons 

Weights 

Epochs 

Recall  Error 

Generalization 
Error 

Quickprop 

4 

140 

50000 

0.04 

0.00 

Quickprop 

5 

175 

20008 

0.00 

0.00 

Quickprop 

6 

210 

12516 

0.00 

0.00 

Quickprop 

8 

280 

5500 

0.00 

0.29 

Quickprop 

10 

350 

6900 

0.00 

0.29 

Evaluation 

The  nature  of  binary  output  neurons  allows  for  more  leeway  in  the  output. 
Regardless  of  the  binary  output,  the  neuron  activation  functions  are  continuous  and 
output  values  between  0  and  1 .  Therefore,  output  values  are  rounded  to  the  nearest 
value  of  either  0  or  1.  On  the  other  hand,  using  stepped  integers  requires  scaling  the 
output  and  then  rounding  to  the  nearest  integer.  Doing  this,  the  precision  suffers  and 
this  process  imparts  more  variation  into  the  output,  possibly  making  the  tolerances 
harder  to  meet.  Using  multiple  binary  outputs  to  represent  stepped  integers  distributes 
the  representation  of  the  output  over  multiple  neurons.  As  a  result,  accuracy  is 
improved  at  the  cost  of  increased  computational  effort. 

With  7  output  neurons  and  24  training  pairs,  there  are  217  known  values.  The 
networks  with  4,  5,  and  6  hidden  neurons  are  overdetermined,  and  the  networks  must 
approximate  a  solution  since  there  is  in  general  no  solution.  These  networks  should 
generalize  well  and  do  so.  The  networks  with  8  and  10  hidden  neurons  are 
underdetermined  with  possibly  multiple  solutions.  These  networks  are  less  likely  to 
generalize  well. 
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Frame  Design  Example 

This  example  attempts  to  be  more  representative  of  typical  preliminary  design 
problems  where  the  entire  solution  space  has  not  been  fully  explored.  As  a  result,  the 
training  set  is  limited  and  subjective.  The  problem  involves  determination  of  the 
geometry  of  a  stable  frame  to  support  given  loads.  Input  to  the  network  consists  of 
available  pinned  support  locations  and  locations  of  applied  loads.  The  network  output 
is  the  member  framing  that  supports  the  loads.  The  problem  is  defined  by  a  grid  of 
possible  member  end  locations,  load  points,  and  pinned  supports  as  shown  in  Figure 
41.  All  input  and  output  neurons  are  binary  indicating  a  feature  is  either  present  (1)  or 
absent  (0).  The  numbers  enclosed  in  circles  in  Figure  41  indicate  input  neuron  numbers 
for  support  locations,  and  the  numbers  enclosed  in  rectangles  in  Figure  41  indicate 
input  neuron  numbers  for  force  locations.  The  direction  of  force  is  assumed  to  have 
horizontal  and  vertical  components  but  the  magnitudes  are  not  considered.  The 
numbers  that  are  not  enclosed  in  Figure  41  indicate  the  output  neuron  numbers 
symbolizing  member  locations.  There  are  a  total  of  12  input  neurons  and  20  output 
neurons  for  this  problem. 

The  training  set  is  shown  in  Figure  42  and  Figure  43.  It  is  a  very  limited 
training  set  as  would  be  most  realistic  preliminary  design  training  sets.  There  are 
obviously  many  possible  frame  configurations  that  would  support  applied  loads  at 
different  locations.  All  possible  frame  configurations  are  not  conceived  since  rarely  will 
all  possible  designs  make  up  a  training  set.  Conceptually,  there  are  an  infinite  amount 
of  designs  that  would  satisfice  a  set  of  design  requirements.  Therefore,  for  practical 
purposes,  a  very  limited  training  set  was  developed.  This  training  set  will  particularly 
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challenge  the  generalization  capabilities  since  it  only  covers  loads  being  applied  at  each 
of  the  possible  load  locations  and  all  three  supports.  The  training  set  does  not  consider 
combinations  of  loads  at  different  locations  nor  elimination  of  some  supports. 


Figure  4 1 :  Frame  Input  Definitions 

Two  networks  were  setup  with  one  hidden  layer.  Network  A  had  1 5  hidden 
neurons.  Equation  (1)  was  used  to  determine  this  value.  There  are  3600  weights  in 
network  A.  Network  B  had  3  hidden  neurons.  There  are  720  weights  in  network  B. 
The  network  parameters  used  for  training  are  shown  in  Table  23 .  Only  the  Quickprop 
algorithm  was  used  for  this  test.  The  network  parameters  are  essentially  the  same  as 
for  the  previous  two  examples.  The  weight  initialization  range  is  larger,  for  no  other 
reason  than  to  see  if  the  network  would  perform  well  with  larger  initial  weights. 
Numerically,  larger  initial,  random  weights  will  generally  increase  the  magnitude  of  the 
gradients. 
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Figure  42:  Training  Cases  1  -  6 
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Figure  43 :  Training  Cases  7-9 


Table  23 :  Network  Parameters 


Learning  Rate,  q 

0.02 

Momentum,  a 

1.0 

Weight  Initialization  Range 

-2<w  <2 

Maximum  Activation 

1.0 

Error  Function 

atanhQ 

Sigmoid  Prime  Shift 

0.1 

Learning  Dynamics 

Quickprop 

Maximum  Growth  Factor 

1.75 

Weight  Decay,  x 

0.0001 

Network  A  required  2119  epochs  to  train  within  the  specified  pattern  tolerance 
of  0. 1  and  the  system  tolerance  of  0.05.  The  recall  error  was  0.0.  Network  B  required 
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36381  epochs  to  train  within  the  specified  pattern  tolerance  of  0.1  and  the  system 
tolerance  of  0.05.  The  recall  error  was  0.0. 

Twelve  test  runs  were  made  using  different  combinations  of  input  parameters. 
Multiple  loads  were  applied  at  various  locations,  and  in  several  test  cases,  the  number 
of  available  supports  were  reduced.  These  cases  were  not  represented  in  any  explicit 
way  in  the  training  set,  and  only  through  extraordinary  generalization  to  implicitly 
learn  the  concept  of  stability  would  the  network  be  able  to  perform  well  on  these  types 
of  problems.  The  network  did  generalize  to  some  degree  when  multiple  loads  were 
presented  since  each  of  the  different  training  pairs  presented  the  loads  at  different 
locations.  The  results  from  network  A  are  shown  in  Figure  44,  and  the  results  from 
network  B  are  shown  in  Figure  45. 

In  Figure  44,  test  case  9  shows  a  structure  with  two  dashed  members.  These 
members  are  dashed  to  indicate  that  the  output  neuron  activation  value  was  almost 
activated.  In  most  cases,  the  output  neuron  activations  were  clearly  either  on  or  off. 
The  limited  training  set  imposes  severe  hardships  on  the  performance  of  both  neural 
networks  since  the  scope  of  the  examples  is  so  limited.  Obviously,  the  networks  did 
not  learn  the  basic  concept  of  stability  since  several  solutions  are  mechanisms  and 
several  examples  have  unsupported  forces.  On  the  whole,  network  A  could  solve  8  of 
the  12  test  problems,  and  network  B  could  solve  only  5  of  the  12  test  cases. 

Both  networks  were  underdetermined.  The  9  training  pairs  produce  only  1 80 
known  values.  Network  A  has  3600  unknown  values  and  network  B  has  720  unknown 
values.  More  training  pairs  that  were  representative  of  the  desired  solutions  would 
likely  help  these  networks  generate  better  answers. 
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Figure  44:  Network  A  Results 
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Figure  45:  Network  B  Results 
This  training  set  is  subjective.  Subjectivity  is  prevalent  in  design  and  arises  due 

to  the  nature  of  requirements  and  satisfaction  of  requirements.  Satisfaction  of 
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conflicting  requirements  requires  a  tradeoff,  and  multiple  solutions  may  satisfy  any 
given  requirement.  Training  sets  that  consist  of  subjective  data  will  not  produce 
objective  results,  and  they  may  obscure  relevant  characteristics  between  mappings 
such  that  poor  generalizations  result.  Network  A  performed  reasonably  well  given  so 
few  training  examples,  but  the  network  results  are  not  promising.  The  networks  did 
not  learn  the  concept  of  stability,  given  the  design  problem  as  represented,  nor  is  there 
any  way  to  reliably  measure  what  the  networks  have  learned. 

Summary 

This  chapter  present  two  preliminary  structural  design  examples.  Both  example 
problems  are  relatively  simple  in  terms  of  preliminary  structural  design,  but  the  types 
of  problems  are  characteristic  of  design.  The  first  example  demonstrated  that  neural 
networks  can  learn  a  mapping  from  design  requirements  to  design  features  even  when 
there  are  conflicting  design  requirements.  The  second  example  demonstrated  the 
limitation  of  neural  networks  in  learning  from  limited  numbers  of  example  solutions. 

The  beam  design  example  allowed  for  a  complete  exploration  and 
measurement  of  network  performance.  It  showed  the  importance  of  test  or 
confirmation  data  sets  to  verify  network  performance,  but  good  network  recall  and 
generalization  performance  is  not  reliable.  The  frame  design  example  showed  that 
neural  networks  can  produce  some  reasonable  results  when  limited  training  data  is 
available,  but  again,  these  results  are  not  reliable.  The  networks  are  not  learning 
concepts  as  humans  do,  but  instead,  they  attempt  to  find  approximations  based  on 
examples.  If  the  examples  are  limited  in  number  and  diversity,  then  in  essence  the 
networks  are  extrapolating.  These  extrapolations  are  generally  unreliable. 
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While  running  and  developing  the  beam  design  example,  it  became  evident  that 
the  type  of  neurons  have  a  significant  effect  on  training.  These  networks  performed 
best  when  using  binary  output  neurons.  Using  stepped  integer  output  was  more 
difficult  since  the  error  signal  was  diluted  due  the  activation  range  of  the  neurons. 
Continuous  valued  output  should  be  used  with  caution  since  they  imply  a  certain 
precision.  A  related  problem  that  arose  is  how  to  measure  learning  convergence.  The 
diverse  nature  of  design  problems  require  different  demands  on  the  accuracy  of 
learning. 

Standard  backpropagation  was  not  used  in  these  examples.  Two  hybrid 
backpropagation  models,  Quickprop  and  SuperSAB,  were  used.  Quickprop  was  more 
stable  than  SuperSAB,  but  SuperSAB  displayed  some  remarkable  training  times.  The 
network  parameters  for  the  Quickprop  algorithm  are  conservative  and  could  be  used 
for  almost  any  problem.  The  SuperSAB  algorithm  as  implemented  needs  further 
refinement. 


CONCLUSIONS 

This  study  focused  on  applying  artificial  neural  networks  to  automated 
preliminary  design.  Many  computational  models  of  design  have  emerged  in  the  past 
decade  using  various  paradigms  of  artificial  intelligence  and  numerical  methods.  The 
evaluation  of  neural  networks  in  this  field  has  been  lacking,  and  this  work  serves  to 
close  that  gap  by  illustrating  and  researching  design  issues  that  appear  well  suited  for 
neural  network  solutions. 

Neural  networks  are  relatively  young  as  a  computing  paradigm  and  are  often 
misapplied,  and  expectations  are  often  unrealistic.  A  great  deal  of  research  from  many 
fields,  particularly  computer  science  and  psychology,  has  been  directed  at  development 
of  neural  systems  that  emulate  and  explore  human  cognition.  This  work  concentrates 
on  the  use  and  identifying  limitations  of  neural  network  applications  to  automating 
design. 

In  the  least,  neural  networks  provide  an  alternative  paradigm  for  researching 
difficult  problems.  Most  neural  networks  are  founded  on  a  sound  mathematical 
background  but  have  particular  biological  plausibility  constraints  that  often  force 
alternative  viewpoints.  Examination  of  difficult  automated  design  problems  from 
unconventional  directions  has  provided  insights  into  computational  design  models. 
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Summary 

This  work  explored  the  application  of  artificial  neural  networks  to  automated 
preliminary  structural  design.  It  first  examined  current  issues  in  automated  design,  and 
identified  promising  areas  where  neural  networks  could  be  applied.  A  general 
overview  of  artificial  neural  networks  provided  a  foundation  for  associating  design 
methodology  and  neural  networks.  Two  neural  network  paradigms  were  explored  to 
solve  different  identified  bottlenecks  to  automating  design  tasks.  The  first  network 
paradigm  explored  a  neural  network  system  for  performing  qualitative  analysis  of 
preliminary  designs.  The  second  network  paradigm  was  proposed  to  overcome 
knowledge  acquisition  and  representation  difficulties  characteristic  of  knowledge 
based  systems  for  design. 

The  first  network  model,  Harmony  theory  networks,  was  designed  as  a  system 
for  dealing  with  large  numbers  of  constraints.  A  model  for  qualitative  analysis  of 
preliminary  structural  systems  was  developed  and  encoded  using  several  Harmony 
theory  networks.  Basic  structural  principles  were  coded  as  constraints,  and  Harmony 
theory  networks  were  created  to  perform  qualitative  analysis  of  basic  preliminary 
structural  designs. 

Several  important  general  characteristics  of  neural  networks  became  evident 
from  development  and  performance  of  Harmony  theory  networks.  First,  they  can  solve 
constraint  problems  in  such  a  way  that  as  many  constraints  are  satisfied  as  possible. 
When  incomplete  problems  were  presented  to  these  networks,  the  problems  were 
solved  in  a  consistent  manner  such  that  as  much  missing  information  completed  as 
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possible  by  the  network.  As  a  result,  these  networks  could  perform  with  incomplete 
and  conflicting  requirements  by  producing  consistent  answers. 

Harmony  theory  networks  use  stochastic  activation  functions  and  simulated 
annealing  to  avoid  local  minimums  in  the  solution  space.  They  essentially  use  a  hill 
climbing  optimization  process  to  find  solution  states  with  maximum  numbers  of 
constraints  being  satisfied.  The  stochastic  nature  of  this  process  resulted  in  long 
execution  times. 

The  chosen  neural  network  simulator  lacked  automated  learning  capabilities  for 
Harmony  theory  networks.  As  a  result,  qualitative  knowledge  was  hand  coded  into  the 
networks.  The  qualitative  state  space  explored  was  limited  to  simple  relative 
relationships  between  triples  of  design  variables.  The  simple  qualitative  state  space 
made  developing  the  networks  feasible,  but  combined  with  long  execution  times,  the 
size  and  scope  of  design  problems  that  could  be  solved  were  limited.  Design  problems 
of  significant  scope  appear  to  warrant  a  more  expressive  qualitative  description. 

Feedforward,  backpropagation  type  networks  were  examined  next  for  the 
purpose  automatically  encoding,  organizing,  and  representing  design  knowledge.  Since 
backpropagation  networks  suffer  from  long  training  times  and  scaling  problems, 
modifications  to  the  gradient  descent  technique  that  standard  backpropagation  uses 
were  investigated  and  implemented  in  a  neural  network  simulator.  The  simulator  was 
designed  and  developed  using  object-oriented  programming  techniques  that  facilitated 
the  simulator's  implementation  and  subsequent  modifications. 

This  network  simulator  was  used  to  develop  several  neural  networks  that 
mapped  from  design  requirements  to  artifacts.  The  first  test  explored  the  feasibility  of 
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learning  and  generalizing  from  good  design  examples.  The  second  test  examined  the 
effect  of  using  binary  output  neurons  on  the  problems  from  the  first  test.  The  third  test 
examined  the  ability  of  the  networks  to  learning  from  a  limited  example  set.  In  all 
cases,  preliminary  designs  were  considered  such  that  the  networks  were  to  generate 
design  concepts. 

Evaluation 
Finding  solutions  to  preliminary  structural  design  problems  using  artificial 
neural  networks  appeared  promising  for  the  following  reasons: 

1 .  Typical  design  problems  do  not  have  closed  form  solutions.  The  abstract  level 
of  preliminary  design  does  not  always  require  numerical  results.  Neural 
networks  have  been  used  to  solve  such  problems. 

2.  A  preliminary  design  problem  may  be  cast  as  a  vector  mapping  of  design 
requirements  to  some  level  of  abstract  design  features.  Neural  networks  can 
map  an  input  vector  space  to  an  output  vector  space. 

3.  Designers  use  many  types  of  disparate  knowledge,  which  is  difficult  to 
represent  and  encode  using  knowledge  based  methods.  Neural  networks 
appear  to  offer  methods  to  automate  encoding  and  representing  different  types 
of  knowledge. 

4.  Design  problems  are  often  under  specified  and  involve  multiple  conflicting 
requirements.  Neural  networks  have  been  used  to  solve  constraint  satisfaction 
problems. 

The  Harmony  theory  networks  demonstrated  in  this  work  did  perform  a  limited 
qualitative  analysis  of  abstract,  preliminary  design  problems.  Abstractions  of  design 
features  helped  define  the  preliminary  design  problems.  The  abstractions  used  in  this 
work  are  common  to  engineering  analysis  problems,  and  they  naturally  led  to 
qualitative  relationships  between  design  variables.  The  qualitative  relationships  used 
were  derived  from  basic  physical  relationships  used  in  engineering,  and  the  knowledge 
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that  these  networks  utilized  was  explicitly  represented.  These  networks  could  always 
produce  answers  even  when  incomplete  design  requirements  were  specified. 

The  usefulness  of  Harmony  theory  networks  was  restricted  by  the  lack  of 
automated  learning,  the  circumscribed  qualitative  state  space  used  in  the  analysis,  the 
specific  nature  of  the  each  network  required  for  each  analysis  situation,  and  relatively 
slow  execution  times.  The  lack  of  automated  learning  in  these  networks  made 
encoding  of  qualitative  relationships  a  tedious  and  error  prone  endeavor.  This  limited 
the  networks'  size  and  complexity.  The  circumscribed  qualitative  state  space  only 
allowed  increasing,  decreasing,  and  unchanging  relationships  between  three  variables 
in  any  qualitative  relationship.  More  expressive  qualitative  representations  are  required 
in  design  in  order  to  perform  relative  comparisons  between  solutions.  The  specific 
nature  of  each  network  made  each  network  applicable  to  only  one  design  situation. 
Results  from  each  network  and  knowledge  encoded  into  each  network  apply  to  only 
that  specific  abstract  preliminary  design  problem  the  network  was  developed  to  solve. 
Finally,  Harmony  theory  networks  are  slow  due  to  the  hill  climbing  method  they  use  to 
search  the  solution  space. 

Artificial  neural  networks  work  well  on  problems  that  do  not  have  an 
algorithmic,  closed  form  solution.  For  those  problems  that  possess  algorithmic 
solutions,  neural  networks  will  not  perform  as  well  since  their  learning  dynamics  will 
likely  take  longer  than  an  algorithmic  solution,  and  more  importantly,  neural  networks 
do  not  generate  exact  numerical  results.  Neural  networks  are  approximators  when  they 
perform  properly.  When  attempting  to  learn  specific  tolerances  or  other  continuous 
values  in  a  training  set,  the  networks  typically  must  be  trained  to  high  tolerances  over 
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many  training  pairs  for  good  numerical  results  on  those  specific  values.  Depending  on 
the  characteristics  of  the  weight  space,  such  high  tolerances  might  be  feasible.  Most 
preliminary  design  problems  are  not  described  by  exact  numerical  thresholds  that  must 
be  met  for  feasible  designs. 

The  design  problems  presented  to  the  backpropagation  networks  in  this 
research  attempted  to  map  design  requirements  to  features  that  defined  an  artifact.  The 
neural  networks  could  easily  recall  learned  patterns  but  their  generalization  capabilities 
are  suspect  since  it  is  difficult  to  measure  the  quality  of  generalization.  Recall  alone  is 
not  a  sufficient  justification  for  using  artificial  neural  networks.  In  this  sense,  artificial 
neural  networks  would  act  as  a  database  of  previous  designs,  and  they  would  not 
perform  as  well  as  typical  computer  databases  since  their  training  and  recall  times 
would  be  slower  than  typical  computer  databases. 

The  quality  of  generalization  is  based  on  the  scope  and  amount  of  the  training 
data  used  to  setup  a  network.  Collecting  and  developing  training  sets  for  networks  to 
solve  preliminary  structural  design  problems  is  as  much  a  limitation  as  collecting  and 
representing  inference  rules  for  preliminary  design  knowledge  base  systems.  Artificial 
neural  networks  have  not  eliminated  the  knowledge  acquisition  bottleneck  that  plagues 
knowledge  based  systems.  The  bottleneck  has  shifted  to  collecting  enough  training  sets 
that  adequately  represent  the  design  space. 

Artificial  neural  networks  can  automatically  organize  knowledge,  but  what 
knowledge  they  contain  is  immeasurable.  Artificial  neural  networks  for  preliminary 
structural  design  do  not  understand  concepts  such  as  stability  and  constitutive 
relationships.  They  can  recall  by  rote  similar  examples  at  best.  How  design  knowledge 
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is  organized  by  the  network  and  what  design  knowledge  is  contained  in  the  network  is 
not  available  for  examination.  This  emphasizes  the  need  for  a  verification  and  testing 
suite  of  design  problems  to  ensure  that  a  network  has  learned  important  aspects  of  a 
design  problem  domain.  Even  if  it  was  possible  to  develop  a  verification  suite,  there  is 
always  the  danger  of  extrapolating  beyond  what  is  implicitly  represented  in  the  training 
set. 

Neural  networks  are  robust  and  relatively  insensitive  to  noisy  input.  Design 
problems  include  conflicting  requirements,  missing  or  unavailable  information,  and 
incomplete  data.  A  neural  network  will  always  produce  some  output  regardless  of  the 
quality  of  the  input.  At  times  the  output  is  a  feasible  design,  but  when  given 
incomplete  input,  a  poorly  trained  network  is  unlikely  to  produce  acceptable  results. 

There  oftentimes  is  a  great  deal  of  confusion  concerning  the  parameters  and 
form  of  neural  networks  for  practical  use.  This  work  gives  some  guidelines  on  how  to 
setup  neural  networks  in  terms  of  identifying  input  and  output  features.  Concerning 
the  multitude  of  network  tunable  parameters,  such  as  learning  rate  and  momentum, 
this  work  has  shown  reasonable  ranges  of  these  parameters  for  good  learning 
performance. 

Backpropagation  networks  have  two  fundamental  limitations  that  further  limit 
their  usefulness  to  preliminary  design.  First,  they  do  not  effectively  employ  incremental 
learning.  As  new  design  examples  or  output  features  become  available, 
backpropagation  networks  would  require  complete  retraining,  a  potentially  costly 
endeavor.  Second,  backpropagation  networks  only  produce  a  single  answer. 
Preliminary  design  relies  on  alternatives  that  each  satisfy  to  some  degree  given  design 
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requirements.  These  conditions  will  probably  limit  the  use  of  backpropagation 
networks  in  preliminary  design. 

Future  Work 

The  local  computation  constraint  limits  the  application  of  computationally 
powerful,  second-order  unconstrained  optimization  processes  applied  to  learning.  The 
local  computation  constraint  is  followed  for  two  reasons.  First,  it  is  assumed  that  local 
computations  would  be  easier  to  implement  on  parallel  computing  hardware.  Second, 
it  is  assumed  that  biological  neural  systems  use  only  local  computations.  Both  of  these 
constraints  limit  the  application  of  artificial  neural  systems  to  easy  problems  that  can 
be  learned  within  a  reasonable  time  period.  The  first  limitation  is  probably  true,  but 
without  available  parallel  hardware,  this  constraint  imposes  unreasonable  restrictions. 
The  second  reason  is  only  true  for  simple  neural  systems.  We  still  do  not  understand 
the  complete  workings  of  the  brain  and  our  own  learning  dynamics.  In  the  engineering 
design  field,  we  should  explore  all  possible  computational  avenues  available. 

By  removing  the  local  computation  constraint,  more  energy  can  be  spent 
examining  ways  to  measure  network  performance,  collecting  meaningful  training  data 
for  real-world  design  problems,  and  representing  preliminary  designs.  Extending  the 
application  of  neural  networks  to  larger  real- world  design  problems  is  needed  to  gauge 
their  feasibility.  This  effort  would  hinge  on  collecting  a  large  set  of  preliminary  design 
examples  to  use  in  training  and  testing  suites.  Care  must  be  taken  to  confine  the  design 
domain  within  reasonable  bounds.  Areas  such  as  highway  bridge  design  concepts  or 
aircraft  wing  structures  are  good  candidate  design  domains.  Essentially,  the 
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preliminary  design  domains  must  have  a  large  body  of  examples  and  some  set  of  well 
established  abstractions  of  not  only  features  by  design  requirements. 

Neural  networks  for  design  would  typically  consist  of  different  types  of 
processors,  both  binary  and  continuous  activations.  Training  these  types  of  networks  is 
difficult  since  a  single  convergence  criteria  does  not  appear  to  be  well  suited  to  both 
types  of  neurons  and  combinations  of  neurons.  Further  work  must  be  done 
investigating  convergence  criteria  with  respect  to  those  features  inherent  in  design 
problems. 

In  general,  engineering  design  knowledge  may  not  be  well  suited  to  inclusion  in 
neural  networks.  Engineering  design  knowledge  does  not  involve  rote  recall,  but  an 
understanding  of  fundamental  concepts.  Innovative  designs  are  not  solely  generated 
from  past  design  experiences,  but  are  an  amalgamation  life  experiences.  Artificial 
neural  networks  can  produce  interesting  answers  to  some  problems.  Users  of  neural 
networks  must  realize  that  these  are  not  the  holy  grail  of  computing,  but  simply 
another  way  of  approaching  a  problem.  They  happen  to  emulate  certain  characteristics 
of  human  cognition,  which  makes  them  interesting  tools  from  a  standpoint  that  is 
different  from  traditional  engineering  processes. 


APPENDIX  A 

Beam  Design  Example 
This  appendix  derives  the  displacement  and  bending  moment  equations  used  in  the 
beam  design  example  from  chapter  1,  provides  a  table  of  cross  section  properties  and  a 
table  of  resulting  displacements  and  principle  axis  bending  stresses  for  that  example,  and 
displays  total  cost  estimates  for  several  beam  design  scenarios.  For  all  derivations,  figures, 
graphs,  and  calculations  the  following  definitions  hold.  All  loads  are  applied  at  mid  span, 

a  =  b-—,  and  the  applied  load  is  represented  by  the  symbol,  P  =  1  •  Ibf ,  in  pounds-force. 

The  modulus  of  elasticity,  E  =  1 — -  is  in  pounds-force  per  square  inch.  The  cross 

in 

section's  moment  of  inertia  about  the  principle  axis,  /  =  I- in4,  is  in  inches  to  the  fourth 

power.  The  span  length,  L  =  10- ft ,  is  in  feet.  For  graphs  the  values  are  plotted  for  every 

foot  along  the  span  using  the  variable,  x  =  0ft,\ft,..L. 

All  derivations  use  singularity  functions  and  the  governing  differential  equations  of 

elastic  beams  shown  below. 

v(x)  =  deflection  of  the  elastic  curve  (1) 

dv 
0(x)  -  —  =  slope  of  the  elastic  curve  (2) 
dx 

d2v 
M(x)  -EI  — -  =  bending  moment  (3) 
dx 

-V(x)  =  EI — ^  =  shear  (4) 

dx3  v 
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d\ 

p(x)  -EI =  load  per  unit  length  (5) 

dx 

Case  1  —  Simply  Supported  Beam 

This  section  derives  the  displacement  and  moment  equations  for  the  simply 

supported  beam  shown  in  Figure  1 . 

The  boundary  conditions  for  this  beam  y 

are:  M(0)  =  M(Z)  =  0  and  v(0)  =  v(L)  =  0.  I 

& 

a 
Equation  (5)  defines  he  beam's  loading  as, 


E,l 
"D" 


K X< ^ 


He 


-» 


d\  , 

EI -  =  -P-(x-a)   .  Integrating  this 

dx 


Figure  1 :  Simple  Beam 


equation  four  successive  times  results  in  the  displacement  equation.  Thus, 


'■^  =  -P-(x-a)°+C, 


dx3 


EI 


d\ 
dx2 


=  -p-(x-ay+c-x+c2 


E.l.^L  =  -P-.(x-a)2+^.x2+C2x  +  a 
dx       2  2  3 


-P 


,    c, 


c, 


E-I-v(x)  = <x-a>3+^-x3+^-x2+C-x  +  C 

6  6  2  3  4 

Using  the  four  boundary  conditions  and  the  moment  and  displacement  equations 

determines  the  constants  of  integration  as 


G  = 


Pb 


C2=0 


C3  =  ~(b2-L2) 


C=o 


L  '  >     6L 

Substituting  these  constants  into  the  displacement  equation  gives  the  following  two 
equations  for  each  half  of  the  beam: 
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v(x)  =  — (x3—L2x),       forO<x<-,and 

v  '     12EI  4  2 


v(x) 


12-E-l 


x3-~L2-x-2 
4 


X-U 


for  —  <x<L 

2 


Substituting  the  constants  into  the  moment  equations  results  in  the  following  to  equations 
for  each  half  of  the  beam: 

M(x)  =  —  for  0<x<-       and       M(x)  =  —  (x  -  L)  for  -<x<L 

v   '        2  2  2  2 

The  maximum  displacement  occurs  at  mid  span  and  is 

-PL* 


4%-EI 


The  maximum  moment  also  occurs  at  mid  span  and  is 

Case  2  —  Cantilevered  Beam 

This  section  derives  the  displacement 
and  moment  equations  for  the  cantilevered  beam 
shown  in  Figure  2. 

The  boundary  conditions  for  this 


w 


\A 


E,l 
B 


K- 


->K- 


^1 


■^ 


Figure  2:  Cantilever  Beam 


problem  are  v(0)  =  0,  (9(0)  =  0,  V(L)  =  0,  and  M(L)  =  0.  Equation  (5)  defines  he  beam's 

loading  as,  EI =  -P-(x-a)~\  Integrating  this  equation  four  successive  times 

dx 


results  in  the  displacement  equation.  Thus, 
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E-I~  =  -P(x-a)°+C, 

dx3  ' 


£-/-^  =  -/'-<x-a)1+C1-x  +  C2 
etc 

E.I.—  =  —.(x-a)2+^x2+C,x+Q 

dx      2  2  2  3 

£/v(x)  =  —  -<x-a>3+^-x3+  — -x2+Cx  +  C4 
6  6  2  3  4 

Using  the  four  boundary  conditions  and  the  moment,  shear,  slope,  and  displacement 

equations  determines  the  constants  of  integration  as 

C.=P  C2=  -^  Q  =  0  C4  =  0 

Substituting  these  constants  into  the  displacement  equation  gives  the  following  two 
equations  for  each  half  of  the  beam: 

Px2  L 

v(x)  = (2-X-3-L),  forO<x<— ,  and 

12E-/  v  ;  2 

V(x)=    —      (6x-L),  for-<x<L 

V  '     485-/  V  ''  2 

Substituting  the  constants  into  the  moment  equations  results  in  the  following  to  equations 

for  each  half  of  the  beam: 

PL  I  I 

M(x)  =  Px ,  forO<x<-       and       M(x)  =  0,  for  -<x<L 

The  maximum  displacement  occurs  at  x  =  Z,  and  is 

-P-L3 

V        = 

max 


24£-7 
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The  maximum  moment  occurs  at  x  =  0  and  is 


M. 


PL 


A 

I 


V 


B 


^(^ 


^•1 


Case  3  —  Propped  Cantilevered  Beam 

|y 
This  section  derives  the  displacement  A  P  -E,l 

and  moment  equations  for  the  propped 

cantilevered  beam  shown  in  Figure  3 .  K-  —4 

L 

The  boundary  conditions  for  this 

Figure  3 :  Propped  Cantilever  Beam 

problem  are:  v(0)  -  v(L)  =  0,  6(0)  =  0,  and  M(L)  =  0.  Equation  (3)  defines  he  beam's 
moments  as, 


EI 


dx2 


MA-(x-0)°  +  R.-(x-0y  -P-(x-ay . 


In  this  equation,  M A  is  the  moment  at  x  =  0,  and  RA  is  the  reaction  at  the  fixed  end. 

Integrating  this  equation  two  successive  times  results  in  the  displacement  equation.  Thus, 

dv  R  P 

E-I.  —  =  MA-(x-0y+-f-(x-0)2---(x-a)2+C, 
ax  2  2 

£-/-v(x)  =  ^-<x-0>2+^--<x-0)3---<x-a>3+C,-x  +  C2 
2  6  6 

Using  the  four  boundary  conditions  and  the  moment,  slope,  and  displacement  equations 

determines  the  constants  of  integration  and  the  unknown  redundants  as 


C,=0 


C2=0 


MA  = 


-3PL 


R,= 


\\P 


16  "        16 

Substituting  these  terms  into  the  displacement  equation  gives  the  following  two  equations 
for  each  half  of  the  beam: 
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Px2  I 

vW  =  n.   cr   .-(H-x-9-L),  for0<x<-,and 

96 • E  ■  I  '  2 

v(x)  =  -f-=-:(-l5Lx2  +5X3  +12X-L2  -2L%     for-<x<L 

96  •  c  •  /  '  2 

Substituting  the  constants  into  the  moment  equations  results  in  the  following  to  equations 
for  each  half  of  the  beam: 

M(x)  =  ~(\lx-3L),      for0<x<|,and  M(x)  =  ^-^(x  -L),  for  -<x<L 

The  maximum  displacement  occurs  at  x  =  0.553  L  and  is 

S-P-L3 
Vmax       240-  EI 

The  maximum  moment  occurs  at  x  -  0  and  is 

/u         ~3PL 
lo 

Case  4  -  Fixed  End  Beam 

This  section  derives  the  displacement  and  moment  equations  for  the  fixed  end 

beam  shown  in  Figure  4. 

iy 
The  boundary  conditions  for  this  A  ^  _"^>' 


problem  are:  v(0)  =  v(Z-)  =  0  and 


± 


_B 

1/ 


K »-^ 


(9(0)  =  0(Z)  =  0.  Equation  (3)  defines  he  K " * 

beam's  moments  as,  Figure  4.  Fixed  £nd  Beam 

d\ 
E-I-  —  =  MA(x-0)0  +  R..(x-0y-P-(x-ay. 
ax 
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In  this  equation,  M A  is  the  moment  at  x  =  0,  and  RA  is  the  reaction  at  the  fixed  end. 

Integrating  this  equation  two  successive  times  results  in  the  displacement  equation.  Thus, 

dv  R  P 

E-I-  —  =MA-(x-0y+^-(x-0)2 <x-a)2+C, 

dx  2  2 


M. t    ,      _. ■>     R.    ,       ~.-i     R 

-^-<x-0>2+^-<x-0>3 

2  6  6 


£./.v(x)  =  _^.<x_0>2+-^-.<x-0>3-  —  -<jc-a>3 +C,-x  +  C2 


Using  the  four  boundary  conditions  and  the  moment  and  slope  equations  determines  the 
constants  of  integration  and  the  unknown  redundants  as 

Substituting  these  terms  into  the  displacement  equation  gives  the  following  two  equations 
for  each  half  of  the  beam: 

Px2  L 

v(x)  = (4-X-3-L),  for0<x<— ,  and 

48-E-/   V  '  2 

v(x)  =  — — — (-9Lx2+4x3+6xL2  -L3),  for -<x<L 
48-E-/  V  ;  2 

Substituting  these  terms  into  the  moment  equations  results  in  the  following  to  equations 

for  each  half  of  the  beam: 

...  ,     -PL     Px             .    n       L                    ...  .      3-PL     Px    c     L 
M(x)  = + ,  for0<x<— ,and  M(x)  = ,  for— <x<L 

8  2  2  8  2  2 

The  maximum  displacement  occurs  at  x  =  —  and  is 

2 


max 


-V5PL3 
240EI 
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The  maximum  moment  occurs  at  x  =  0,  x  =  — ,  and  x  =  L  and  is 

2 


M_„  = 


■PL 
8 


The  previous  sections  defined  the  mechanics  of  each  design  case;  the  next  section 
lists  the  available  cross  section  properties  for  this  design  problem. 
Cross  Section  Properties 

This  section  lists  the  cross  section  properties  used  in  the  example  from  chapter  1 . 
This  design  problem  has  a  limited  number  of  varied  cross-sections  in  order  to  make  the 
problem  more  tractable.  Actual  design  problems  also  have  limited  resources,  such  as  a 
finite  number  of  available  beam  types;  therefore,  it  is  not  unreasonable  to  impose  such 
limitations.  Table  1  shows  the  available  circular  cross  sections  and  their  relevant 
properties.  Table  2  shows  the  available  I-shaped  cross  sections  and  their  relevant 
properties,  and  Table  3  does  the  same  for  channel  cross  sections.  The  range  and 
magnitudes  of  the  areas  and  moment  of  inertia  are  relatively  the  same  for  each  section; 
otherwise,  there  would  be  a  single  acceptable  cross  section  for  each  design  scenario. 


Tab 

e  1  :  Circular  Cross  Sections 

Outside 

Diameter 

(in.) 

Wall 

Thickness 

(in.) 

Area  (inA2) 

Inertia 
(inA4) 

4.50 

0.13 

1.72 

4.11 

6.63 

0.25 

5.01 

25.47 

8.63 

0.25 

6.58 

57.72 

10.75 

0.38 

12.22 

164.67 

12.75 

0.38 

14.58 

279.34 
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Table  2:  I-Shaped  Cross  Sections 

Flange 
Widthjin.} 

Flange 

Thickness 

(in.) 

Web  Height 
(in.) 

Web 

Thickness 

(in.) 

Area 

(inA2) 

Inertia 
(inA4) 

2.00 

0.13 

6.00 

0.13 

1.25 

6.94 

4.00 

0.13 

8.00 

0.13 

2.00 

21.84 

6.00 

0.25 

10.00 

0.13 

4.25 

89.23 

8.00 

0.25 

10.00 

0.38 

7.75 

136.33 

8.00 

0.38 

12.00 

0.25 

9.00 

265.78 

Table  3 :  Channe 

Cross  Sections 

Flange 
Width  (in.) 

Flange 

Thickness 

(in.) 

Web  Height 
(in.) 

Web 

Thickness 

(in.) 

Area  (inA2) 

Inertia 
(inA4) 

1.50 

0.13 

6.00 

0.13 

1.13 

5.77 

2.00 

0.25 

8.00 

0.25 

3.00 

27.69 

3.00 

0.38 

10.00 

0.38 

6.00 

91.82 

3.00 

0.50 

10.00 

0.50 

8.00 

124.42 

3.50 

0.50 

14.00 

0.50 

10.50 

298.38 

There  are  a  total  of  sixty  possible  beam  designs  given  these  fifteen  cross  sections 
and  four  classes  of  solutions  as  shown  in  chapter  1.  Table  3,  Table  4,  and  Table  5  contain 
all  the  displacements  and  bending  stresses  for  each  design  case  and  cross  section 

combination  given  P  =  \lbf,  E=  1  ~4-.  The  tables  list  results  by  moments  of  inertia  in 


in 


ascending  order. 


Table  3 :  Circular  Cross  Sections  ~  Bending  Stresses  and  Disp 

acements 

Bending 
Case  1 

Bending 
Case  2 

Bending 
Case  3 

Bending 
Case  4 

Displ.  Case 
1 

Displ.  Case 
2 

Displ.  Case 
3 

Displ.  Case 
4 

16.41 

32.82 

12.31 

8.20 

5.0641 

10.1282 

2.1741 

1 .2660 

3.90 

7.80 

2.93 

1.95 

0.8178 

1.6356 

0.3511 

0.2045 

2.24 

4.48 

1.68 

1.12 

0.3609 

0.7219 

0.1550 

0.0902 

0.98 

1.96 

0.73 

0.49 

0.1265 

0.2530 

0.0543 

0.0316 

0.68 

1.37 

0.51 

0.34 

0.0746 

0.1492 

0.0320 

0.0186 
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Table  4:  I-Shaped 

Cross  Sections  —  Bending  Stresses  and  Displacements 

Bending 
Case  1 

Bending 
Case  2 

Bending 
Case  3 

Bending 
Case  4 

Displ.  Case 
1 

Dispi.  Case 
2 

Displ.  Case 
3 

Displ.  Case 
4 

13.51 

27.02 

10.13 

6.75 

3.0019 

6.0038 

1.2888 

0.7505 

5.67 

11.33 

4.25 

2.83 

0.9540 

1.9079 

0.4096 

0.2385 

1.77 

3.53 

1.32 

0.88 

0.2335 

0.4670 

0.1002 

0.0584 

1.16 

2.31 

0.87 

0.58 

0.1528 

0.3056 

0.0656 

0.0382 

0.72 

1.44 

0.54 

0.36 

0.0784 

0.1568 

0.0337 

0.0196 

Table 

5:  Channel  Cross  Sections  —  Bending  Stresses  and  Disp 

acements 

Bending 
Case  1 

Bending 
Case  2 

Bending 
Case  3 

Bending 
Case  4 

Displ.  Case 
1 

Displ.  Case 
2 

Displ.  Case 
3 

Displ.  Case 
4 

16.25 

32.51 

12.19 

8.13 

3.6121 

7.2243 

1.5508 

0.9030 

4.60 

9.21 

3.45 

2.30 

0.7524 

1 .5049 

0.3230 

0.1881 

1.76 

3.51 

1.32 

0.88 

0.2269 

0.4538 

0.0974 

0.0567 

1.33 

2.65 

0.99 

0.66 

0.1674 

0.3349 

0.0719 

0.0419 

0.75 

1.51 

0.57 

0.38 

0.0698 

0.1396 

0.0300 

0.0175 

Normalizing  the  displacements  using  the  minimum  displacement  from  the  three 
types  of  cross  sections  and  the  bending  stresses  using  the  minimum  bending  stress  from 
the  three  types  of  cross  sections  provides  "cost  factors"  for  displacements  and  bending 
stresses,  respectively.  These  values  are  then  combined  with  the  other  costs  for  weight, 
fabrication,  and  maintenance.  Scale  factors  applied  to  the  weight,  fabrication,  and 
maintenance  costs  account  for  the  differences  in  magnitudes  between  the  costs  and  thus 
the  relative  importance  of  the  different  requirements. 

Without  scaling  the  weight,  fabrication,  and  maintenance  performance  costs,  the 
displacement  and  bending  requirements  dominate,  and  the  propped  cantilever  (case  3) 
using  the  largest  I-shape  section  has  the  lowest  cost  (17.00)  and  thus  highest  performance 
index.  Figure  5  shows  a  plot  of  the  cost  contours  and  the  region  of  maximum 
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performance.  It  should  be  noted  that  the  fixed  end  beam  (case  4)  using  either  of  the  two 
largest  I-shaped  sections  was  very  close  to  the  case  3  solution. 


Minimum 
Cost  (17.00) 


Circular 


I-Shaped 


Channel 


Figure  5:  Cost  Contours  —  Propped  Cantilever  Beam  (Displacement  and 

Bending  Goals  Stressed) 

Several  other  design  test  cases  were  checked  by  varying  the  emphasis  on  weight, 

manufacturing,  and  fabrication  requirements.  Emphasizing  the  importance  of  the  weight 

requirement  by  scaling  the  weight  costs  by  ten  results  in  a  case  4  (fixed  end  beam)  solution 

using  the  lightest  I-section  and  is  shown  in  Figure  6.  If  the  manufacturing  or  fabrication 

requirements  are  stressed  by  scaling  the  respective  costs  by  100,  then  cantilevered 

solutions  result  as  shown  in  Figure  7  and  Figure  8.  When  weight,  manufacturing,  and 

fabrication  requirements  are  all  scaled  by  a  factor  often,  then  a  simple  beam  using  a 

moderately  sized  I-section  results  in  lowest  cost  (Figure  9).  A  propped  cantilevered 

solution  is  also  very  close. 
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Circular  I-Shaped  Channel 

Figure  6:  Cost  Contours  -  Fixed  End  Beam  (Weight  Goals  Stressed) 

Summary 

This  appendix  has  presented  a  tractable  beam  design  example  that  clearly  illustrates 

interactions  between  design  requirements.  Depending  on  how  much  emphasis  designers 

place  on  a  problem's  different  design  requirements,  different  designs  will  result.  Thus,  one 

of  the  most  important  steps  that  a  designer  must  take  is  to  identify  and  clarify  not  only 

design  goals  but  their  importance  and  interactions. 
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Circular  I-Shaped  Channel 

Figure  7:  Cost  Contours  —  Cantilever  Beam  (Manufacturing  Goals  Stressed) 


Circular  I-Shaped  Channel 

Figure  8:  Cost  Contours  —  Cantilever  Beam  (Fabrication  Goals  Stressed) 
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Circular  I-Shaped  Channel 

Figure  9:  Cost  Contours  ~  Simple  Beam 


APPENDIX  B 

QuikProp  Implementation 

This  appendix  describes  each  of  the  classes  that  make  up  QuikProp.  Each  class  is 
made  up  of  member  data  (i.  e.,  attributes)  and  member  functions  (i.  e.,  behavior).  The 
member  data  is  a  collection  of  data  elements  that  consist  of  built-in  data  types  such  as 
integers  and  other  classes.  Member  functions  are  procedures  that  define  how  a  class 
interacts  with  its  environment. 
CNet 

CNet  is  an  abstract  class  derived  from  CObject  from  which  each  network  paradigm 
is  derived.  It  contains  virtual  functions  that  each  network  should  specialize  and  basic 
member  functions  and  data  that  are  shared  among  all  networks.  Table  1  describes  the 
member  data,  and  Table  2  describes  the  member  functions  of  the  CNet  class. 

Table  1 :  CNet  Member  Data 


Data  Type 

Data  Name 

Description 

char 

cMode 

network  execution  mode 

char  * 

szNetName 

used  as  base  name  for  files 

int 

nlnpLayer 

size  of  the  input  layer 

int 

nOutLayer 

size  of  the  output  layer 

long 

nMaxCycles 

maximum  number  of  cycles  to  run 

long 

nSaveCycles 

number  of  cycles  to  run  between  automatically  saving 
system  data 

long 

nEpochNum 

last  completed  epoch  number 

double 

dTolerance 

allowable  average  output  layer  error  for  convergence 

double 

dPattemTol 

minimum  allowable  pattern  output  error  for 
convergence 
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Table  1— continued 


Data  Type 

Data  Name 

Description 

double 

dMean 

arithmetic  mean  of  change  in  weights  and  biases 

double 

dStdDev 

standard  deviation  of  change  in  weights  and  biases 

Data  Type 

Data  Name 

Description 

double 

dCorr 

correlation  of  weights  and  biases 

double 

dWghtMagnitude, 

average  magnitude  of  weights 

double 

dThreshhold 

pattern  error  threshold 

double 

dMajority 

percent  of  patterns  below  dThreshhold  for  majority 
converged 

double 

dTolDecay 

fraction  to  reduce  dThreshhold  after  dMajority 
achieved 

char 

fCalcStats 

flag  is  TRUE  when  training  statistics  calculated  and 
displayed 

char 

fStatsOut 

flag  is  TRUE  when  training  statistics  are  to  be  saved 

char 

fConverged 

flag  is  TRUE  when  all  patterns  are  less  then 
dPatternTol 

char 

fSmartLearning 

flag  is  TRUE  when  difficult  patterns  are  presented 
more  often 

Table  2:  CNet  Member  Functions 


Name 

Description 

Virtual 

SaveWeights 

saves  network  weights 

Y 

LoadWeights 

loads  network  weights 

Y 

SetMode 

sets  the  network  execution  mode  flag 

N 

GetMode 

gets  the  network  execution  mode  flag 

N 

EncodePattern 

Presents  a  pattern  pair  for  learning 

Y 

RecallPattern 

Retrieves  a  pattern  based  on  some  input 

Y 

ReadPatterns 

reads  all  pattern  pairs  from  a  network's  training  file 

Y 

Cycle 

presents  each  pattern  pair  to  the  network  one  time 

Y 

Train 

continuous  cycling  through  pattern  pairs 

Y 

Test 

calculates  correctness  of  training 

Y 

Run 

runs  a  set  of  input  patterns  calculating  output  patterns 

Y 

CheckForParallel- 
Patterns 

checks  each  input  pattern  for  parallelism  with  other  input 

patterns 

Y 

CalcStats 

computes  training  statistics 

Y 
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CBp 

The  CBp  class  implements  backpropagation  type  networks  through  its  member 
functions  and  data.  It  is  a  subclass  of  CNet.  For  those  member  functions  that  it  inherits 
from  CNet  and  requires,  it  specializes.  Table  3  and  Table  4,  respectively,  show  the 
member  functions  and  member  data  of  the  CBp  class. 


Table  3:  CBp 

Member  Functions 

Data  Type 

Data  Name 

Description 

int 

nLayers 

total  number  of  layers 

int* 

piLayerSizes 

array  of  hidden  layer  sizes 

char 

fEpoch 

flag  to  indicate  epoch  based  training 

char 

fPermute 

flag  to  indicate  permutation  of  pattern 
presentation  during  training 

char 

fMinMax 

flag  is  TRUE  when  min/max  data  has  been 
read 

char 

fNonlinearErr 

flag  is  TRUE  when  nonlinear  error 
function  is  being  used 

char 

fSecondOrder 

flag  is  TRUE  when  second  order 
approximation  is  being  used  (Quickprop) 

char 

fAdjustableRate 

flag  is  TRUE  when  adjustable  learning  rate 
is  being  used 

int 

nNumPatterns 

number  of  training  patterns 

double 

dMomentum 

back  propagation  momentum  term 

double 

dLearnRate 

back  propagation  learning  rate 

double 

dlnitRange 

maximum  value  for  weight  initialization 

double 

dMaxScale 

maximum  activation  value 

double 

dMinScale 

minimum  activation  value 

double 

dShift 

shift  for  sigmoid-prime  function 

double 

dGrowth 

maximum  growth  factor  for  Quickprop 
and  SuperSAB 

double 

dWeightDecay 

weight  decay  term  for  Quickprop  and 
SuperSAB 

double 

dlncrease 

step-size  increase  factor 

double 

dDecrease 

step-size  decrease  factor 

matrix* 

pmatWeights 

array  of  synapse  weight  matrices 

matrix* 

pmatDeltaWeights 

array  of  changes  to  weight  matrices 
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Table  [ 

5— continued 

Data  Type 

Data  Name 

Description 

matrix* 

pmatEpochDeltaWeights 

array  of  epoch  based  weight  error  for 
accumulation 

matrix* 

pmatErrorDeriv 

work  array  of  error  derivatives  matrices 
for  Quickprop 

matrix* 

pmatEpochOldErrorDeriv 

array  of  previous  cycle's  accumulated  error 
derivative  matrices 

matrix* 

pmatEpochNewErrorDeriv 

array  of  current  cycle's  accumulated  error 
derivative  matrices 

vector* 

vecActivations 

array  of  neuron  activation  vectors 

vector* 

vecNeuronErrors 

array  of  neuron  error  vectors 

vector* 

vecBiases 

array  of  neuron  bias  vectors 

vector* 

vecEpochNeuronErrors 

array  of  epoch  based  neuron  error  vectors 
for  accumulation 

vecpair* 

vpMinVectors 

array  of  normalizing  vectors  or  input  and 
output 

vecpair* 

vpMaxVectors 

array  of  normalizing  vectors  or  input  and 
output 

vecpair* 

vpPatternPairs 

array  of  vecpairs  that  holds  each  set  of 
training  vectors 

int* 

piPermutation 

order  of  pattern  presentation  for  current 
epoch 

Table  4:  CBp  Member  Functions 

Name 

Description 

SaveWeights 

saves  the  weights  for  each  layer  and  the  epoch  number  to  disk 

LoadWeights 

loads  weights  and  last  epoch  number  from  disk 

Cycle 

presents  each  pattern  pair  to  the  network 

ReadPatterns 

reads  all  training  patterns  from  pattern  pair  file 

NewtonApprox 

second  order  Newton  approximation  for  Quickprop 

EncodePattern 

store  one  pattern  pair  by  adjusting  weights 

RecallPattern 

recall  an  output  pattern  given  an  input 

AdjustRate 

adjusts  learning  rates  using  SuperSAB 

CMatrix 

The  CMatrix  implementation  considers  only  double  precision  matrices  and  their 
operations  that  are  important  to  artificial  neural  networks.  This  is  not  a  general  matrix 
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class.  It  is  coded  for  computational  efficiency  and  descriptive  clarity.  Table  5  and  Table  6, 
respectively,  show  the  member  data  and  member  functions  of  the  CMatrix  class. 

Table  5:  CMatrix  Member  Data 


Data  Type 

Data  Name 

Description 

int 

nRows 

number  of  rows  in  the  matrix 

int 

nCols 

number  of  columns  in  the  matrix 

double** 

pMemory 

allocated  memory 

Table  6:  CMatrix  Member  Functions 


Name 

Description 

operator= 

assignment  operator 

operator+ 

matrix  addition 

operator+= 

matrix  additive  assignment 

operator* 

matrix  multiplication 

operator* 

scale  a  matrix  by  a  constant 

operator*= 

scale  a  matrix  and  assignment 

Save 

write  a  matrix  to  a  disk  file 

Load 

read  a  matrix  from  a  disk  file 

GetRows 

return  number  of  rows  in  a  matrix 

GetColumns 

return  number  of  columns  in  a  matrix 

GetElement 

returns  a  specified  matrix  element 

SetElement 

sets  a  specified  matrix  element 

Transpose 

transposes  a  matrix 

CVector 

The  CVector  implementation  considers  only  double  precision  vectors  and  their 
operations  that  are  important  to  artificial  neural  networks.  This  is  not  a  general  vector 
class.  It  is  coded  for  computational  efficiency  and  descriptive  clarity.  Table  7  and  Table  8, 
respectively  describe  the  member  data  and  member  functions  of  the  CVector  class. 
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Table  7:  C Vector  Member  Data 


Data  Type 

Data  Name 

Description 

int 

nLen 

number  of  elements 

double* 

pMemory 

allocated  memory 

Table  8:  C Vector  Member  Functions 


Name 

Description 

operator[ 

subscript  operator 

operator= 

assignment  operator 

operator+ 

vector  addition 

operator+= 

vector  addition  and  assignment 

operator- 

vector  subtraction 

operator— 

vector  subtraction  and  assignment 

operator* 

dot  product 

operator* 

scale  a  vector 

operator= 

vector  equality  operator 

Length 

gets  number  of  elements  in  the  vector 

Distance 

calculates  distance  between  two  vectors 

Normalize 

vector  normalization 

MaximumValue 

finds  maximum  element  of  a  vector 

Maximumlndex 

finds  index  of  maximum  element  in  a  vector 

Scale 

scales  each  element  of  a  vector  between  two  given  values 

Randomize 

sets  a  vector's  elements  to  random  values 

Save 

write  a  vector  to  a  disk  file 

Load 

read  a  vector  from  a  disk  file 

Sigmoid 

applies  sigmoid  function  to  a  vector's  elements 

Atanh 

applies  atanh  function  to  a  vector's  elements 

CVecPair 

The  CVecPair  implementation  is  composed  of  two  C  Vector  objects.  This  class's 
primary  purpose  to  maintain  training  pairs  of  vectors.  Table  9  and  Table  10,  respectively, 
describe  the  member  data  and  member  functions  of  the  CVecPair  class. 

Table  9:  CVecPair  Member  Data 


Data  Type 

Data  Name 

Description 

vector* 

vecl 

input  vector 

vector* 

vec2 

output  vector 
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Table  10:  CVecPair  Member  Functions 


Name 

Description 

operator= 

assignment  operator 

operator= 

equality  operator 

GetVectorOfVecpair 

returns  one  of  the  vectors  of  a  vector  pair 

Scale 

scales  each  element  in  a  vector  pair  between  two  given  values 

Save 

write  a  vector  pair  to  a  disk  file 

Load 

read  a  vector  pair  from  a  disk  file 
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